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The following bibliographic essay is a critical review of many sources 
on statistics of religious affiliation, including references to studies by 
social scientists that treat of or are closely related to religious affiliation. 
It is found that statistics of religious affiliation generally originate with 
the unstandardized records kept by clergymen or lay clerks in over 
300,000 local churches, who are for the most part untrained. Officials of 
national religious bodies probably receive and publish reports from 
most local churches, but a considerable proportion of these officers make 
public official reports that are only their own estimates. Periodic com- 
pilations of “the latest information” are noted. A brief summary of the 
U. 8. Censuses of Religious Bodies is also made. A Church Distribution 
Study is described. Social scientists probably regard most current sta- 
tistics on religious affiliation as crude. The limitations and defects of 
these statistics have received relatively little documentary study by 
trained statisticians. 


INTRODUCTION 


HIs list of references and the accompanying text have been prepared as a 

brief guide to much of the literature on statistics of religious affiliation in 
the United States. The purpose is to provide systematic assistance to the ma- 
ture student and to the researcher who wishes both to understand sources and 
processes and also to interpret, or make a critical appraisal of, the kinds of in- 
formation within the scope of this bibliographic essay. 

Certain of the materials on religious affiliation here noted contain only 
reports of or summaries of figures, without explanation or interpretation. 
Others consider religious affiliation and other social data. Thus there are in- 
cluded among the selected references that follow not only those simply report- 
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ing figures, but also numerous titles, mainly by social scientists, which treat of 
the social significance of religious affiliation, or interpret trends, or make re- 
finements of the crude extensive data often reported, or report studies based 
upon field research by trained persons. While no claim is made that the list 
of titles is complete, an effort has been made, within the time available, to list 
titles fairly representative of the main types of publications. 

Note here is taken of sources on the origins, local and otherwise, of the sta- 
tistics of religious affiliation; on the federal censuses of religious bodies; on the 
various private publications that record series, in some instances for many 
years; on the periodic reporting of the figures gathered by religious bodies. 

Attention is given to the relatively meager documentation on the limitations, 
or defects, of both historical and current statistics of religious affiliation. Infor- 
mation on the location of depositories of historical courses is provided. And 
because comparisons with figures from other nations are sometimes desired, a 
few publications recording figures of many countries are listed. 

Some unpublished materials, including dissertations, are included. It has not 
been possible to give adequate attention to the many statistical publications 
of the numerous separate denominations, but advice is given concerning the 
locations of the headquarters and the statistical offices at which these materials 
are produced and from which they may be obtained. 

The compiler gratefully acknowledges valuable aid from H. S. Linfield, 
Jewish Statistical Bureau, New York; Rev. William J. Gibbons, S.J., lecturer 
in sociology at Fordham University; Thomas B. Kenedy, editor of the Official 
Catholic Directory; and the librarians of the American Jewish Committee and 
of Fordham University. The H. Paul Douglass Collection at the Bureau of 
Research and Survey, National Council of Churches, was searched for many 
of the titles included. 


1, ORIGINS OF STATISTICS OF RELIGIOUS AFFILIATION 


“The typical church parish or congregation is expected to maintain a local 
roll of members,” Douglass and Brunner record in their summary of 48 re- 
search projects published in 78 volumes [53]. “The turned-over corner of a 
card may distinguish the active from the inactive, and a blue pencil mark the 
resident from the non-resident. Someone in the church will have a list of 
Sunday-school pupils and this list may or may not show which of them are 
church members. Various membership lists will be found in the hands of their 
respective officers, but are seldom assembled as one list. The financial authori- 
ties of the church will have their subscription list and roll of other supporters. 
The regularity of attendance of individuals will rarely be recorded, and there 
will be little agreement as to what constitutes regularity. . .. Well-organized 
churches maintain additional lists of marginal adherents.” Beyond these there 
are even “remoter constituencies.” Whenever individuals have been interro- 
gated by the tens of thousands, as they have been in house-to-house canvasses, 
“only small proportions are found unwilling to declare themselves Protestant, 
Catholic, Jew,” or related to “some other historic creed.” 

“Except for scant data gathered in surveys, no records exist as to the age dis- 
tribution of church membership.” About one-seventh of the members of rural 
churches were not resident in the communities in which the churches studied 
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were located. A later study by Vidich and Bensman of a community in up-state 
New York indicated that about one-fourth of the members of rural churches 
were non-resident [195]. Occupational distribution of the constituencies of the 
local churches is seldom available, Douglass and Brunner report. 

Statistics of religious affiliation thus originate with the numerous local insti- 
tutions [24]. Compilations of these statistics come either from reports obtained 
from records of the parishes or congregations or from estimates furnished by 
officers of local churches. The clergyman or the lay clerk forwards the local 
figures to a regional or national office of the religious body. The local records 
are kept on standard forms as determined by a national body in certain de- 
nominations. In other bodies the records are kept in accordance with the wishes 
of the local churches. These local records are kept by the clergymen or the lay 
clerk or by both. And both the clergymen and the lay officers are untrained in 
the keeping of statistical records, or are persons with only the most elementary 
knowledge or experience. 

Religious statistics are gathered by the various religious bodies for their own 
purposes, and there is thus wide variety in the kind of information sought 
locally. However, the bodies (with a few exceptions considered later) make 
reports of the persons affiliated, and these are the main concern in this essay. 

The religious bodies make their own definitions of membership or affiliation, 
and accordingly there are marked differences in accordance with the polity of 
the institutions [116]. The definitions are also sometimes changed, but it is 
understood that there have been no major changes since the Census of Reli- 
gious Bodies, 1926. The main practices since that date are as follows: The 
Eastern Churches generally report estimates of the total number of persons 
within the cultural or nationality group served. The Jewish Congregations 
report on the number of Jews in communities having congregations. The 
Roman Catholic Church, the Lutheran bodies, the Protestant Episcopal 
Church, the Moravian bodies, and a few others report as members the total 
number of baptized persons, including infants. Most Protestant bodies report 
as members those persons who have attained full membership, usually at about 
age 13. 


2. COMPILATIONS OF THE STATISTICS 


From time to time most religious bodies compile reports or estimates from 
their local parishes or congregations. Probably about half the national bodies 
receive reports from their local churches annually and then issue the figures to 
their constituencies and to the public. The other national bodies report their 
statistics at irregular intervals [21], and these reports sometimes consist 
solely of estimates by national officers. 

It seems probable that the bodies that report annually the figures received 
systematically from their local churches are mainly the larger denominations. 
Thus consic -rably more than half the local churches appear to report annually 
to some office of their denomination [21]. 

Some denominations have published statistic: annually for many years. In 
the forthcoming Historical Statistics [21] figures are reported for the Roman 
Catholic Church from 1891 to 1956; for the Presbyterian Church in the U.S.A. 
from 1826 to 1956; for the Protestant Episcopal Church from 1927 to 1956; 
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for the Methodist Church from 1790 to 1956; for the Seventh-Day Adventists 
from 1907 to 1956. 

Comprehensive national compilations of some church statistics have been 
made at intervals since 1850 by the Bureau of the Census; and during recent 
decades by the Christizn Herald, New York and by the National Council of 
Churches, New York in the Yearbook of American Churches, a reference work 
issued since 1916 [116]. The first edition of the Yearbook was published in 1916; 
it appeared irregularly until 1951 when it became an annual publication. 

Prior to 1890 the Bureau of the Census did not gather figures on religious 
affiliation but did ascertain the number of local institutions, their seating ca- 
pacity, and the value of their edifices. In 1890, the Bureau gathered figures on 
affiliation and conducted the Censuses of Religious bodies in 1906, 1916, 1926, 
and 1936, described more fully below. 

Historical Statistics [21] contains the totals reported in 28 compilations of 
comprehensive national church membership or affiliation figures between 1890 
and 1956. Of these 28, 12 were made by the Christian Herald, 11 by the Year- 
book of American Churches [116], and 5 by the Bureau of the Census. Since 
1950, annual figures have been published in the Yearbook. The figures are 
gathered by means of a brief form mailed to the statistical offices of all the 
known religious bodies. The Yearbook then published “the latest information” 
received from these officials. Because many bodies report at irregular intervals, 
these annual figures are not directly comparable with one another. 

During recent years “the latest information” has appeared in the Yearbook 
[116] for over 250 bodies annually, and usually about 10 or 12 bodies do not 
have figures or decline to furnish them. Those declining to furnish figures are 
believed to be relatively small, with the exception of the Church of Christ, 
Scientist, with headquarters in Boston. This body has a regulation forbidding 
the numbering of the people and the publication of statistics of affiliation. The 
local churches of this body reported a total membership of 268,915 persons in 
the Census of Religious Bodies, 1936 [27], but there has been no report since 
that date. Since 1955, the total number of persons reported to be affiliated with 
all religious bodies in Continental United States has exceeded 100,000,000. It 
would appear that the reports assembled annually must include all but perhaps 
about one per cent (or less) of the total religious affiliation claimed by the vari- 
ous bodies, 

In the current Yearbook [116] reports of “the latest information” from 255 
bodies are published, out of a total of 267 bodies listed. Of the 255 reports, 
about 70 are given in round numbers by the official statisticians of the respec- 
tive denominations. These bodies in all probability either report estimates in 
round numbers received from their parishes or congregations, or give estimates 
made by national officers. 

Since 1951 the Yearbook [116] has published religious affiliation reported by 
six large groups: Roman Catholic, Jewish, Protestant, Eastern Orthodox, 
Buddhist, Old Catholic. These totals also appear in Historical Statistics [21]. 
The Yearbook [116] also publishes certain comparisons of religious affiliation 
with total population. 

A special article on the controversial question of “The Children and Church 
Membership,” summing up what is known about persons under age 13 in the 
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yer statistics, appears in the Yearbook for 1959, published September, 1958 
116}. 

Many of the larger bodies issue annual statistical reports from their head- 
quarters which are listed in the Yearbook [116]. 

Since 1954 the annual Statistical Abstract of the United States [28] has re- 
printed much of the material on religious affiliation gathered by the Yearbook. 

Tabulations on trends in church membership, by groups of religious bodies, 
revealing considerable differences among well-known denominations, down to 
1952, have appeared in Information Service [98, 99]. 

Mead’s Handbook of Denominations [133] is a ready source for both statistics 
and concise accounts of the history and programs of the religious bodies in the 
United States. 

The Official Catholic Directory [110] publishes annual reports of statistics by 
dioceses of the Roman Catholic Church, and in recent years has also given fig- 
ures by states. The Directory lists every parish with the street address, and the 
county in which it is located, but does not publish membership by parishes. 
The experience of the publishers of the Directory, and an account of prior publi- 
cations, are found in a recent article by Kenedy, the present editor [111]. For 
a discussion of Roman Catholic figures prior to 1900 a book by Shaughnessy 
may be consulted [168]. 

Linfield has concisely reviewed statistics of Jewish Congregations from 1850 
through 1937 [125], and presented data on Jewish communities in the United 
States in 1940 [124]. The annual American Jewish Yearbook contains estimates 
of Jews in various communities, and reviews of activities of the congregations 
and their national organizations [64]. Kertzer has interpreted the recent sub- 
urban and other trends among Jewish congregations [112]. 

Anderson has written a brief study of Eastern Orthodox statistics and activi- 
ties [5]. Life interprets the Adventists, Pentecostal Churches, Churches of 
God, the Nazarenes, the Churches of Christ, and Jehovah’s Witnesses as a 
potent “third force in Christendom,” next to Protestant’s older and established 
bodies and to Roman Catholicism [123]. Van Dusen interprets the significance 
of the Third Force for others in organized religion [198]. 

O’Dea, a sociologist and a Roman Catholic, has made a thorough study of 
the Latter-Day Saints, popularly called Mormons [143]. Braden and Johnson 
interpret Christian Science [16, 103]. Effendi writes a systematic account of 
The Baha’i faith [56]. The much-puvlicized Jehovah’s Witnesses are described 
by Cohn [33] and Stroup [181], while the Witnesses’ own Yearbook records 
their statistics and activities [104]. 

The books by Herman Weber here noted contain his historical studies of 
church membership data [205, 206]. The Interchurch World Movement’s vol- 
ume listed included muck information on the situation of the Protestant 
churches after World War I [102]. 


3. FEDERAL CENSUSES OF RELIGIOUS BODIES 


The Censuses of Religious Bodies conducted by the Bureau of the Census 
[24, 25, 26, 27] have been regarded as of particular value by church statisti- 
cians. They were based upon reports received directly by the Bureau from the 
pastors or clerks of the parishes or congregations. They collected national data 
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and distributed the information by states, counties, and cities. Private com- 
pilers, on the other hand, and the denominations themselves, have seldom been 
able to present data by geographical units. The Censuses were made by one 
agency with uniform methods. They gathered data for those denominations 
that were not compiling figures for themselves. They presented statistics for 
churches in rural and in urban communities. They provided some check upon 
the private compilations made by denominations or others. With very few ex- 
ceptions the official statisticians of the religious bodies recognized the value of 
the Censuses, and approved this function of a federal government agency. This 
summary of opinion, and other relevant data, appear in a brief history of the 
Census experience running to about 1,600 words, “The Rise and Fall of the 
Census of Religious Bodies,” in Information Service [95]. A systematic account 
of the Census of 1936, the last one, including the extensive opposition among 
local churches, was also given in Information Service [94]. In 1936 there was a 
compilation by the Christian Herald, which recorded about 20 per cent more 
local churches than reported to the Bureau of the Census. This was in sharp 
contrast to the experience in 1926, when about the same number of local 
churches reported to the Bureau of the Census as to the Christian Herald. 

In 1890, as noted above, the Bureau, as part of the Census of Population, 
recorded membership of local churches, value of edifices, number of clergymen. 
Then in 1906 began the process of a separate census made by means of a form 
mailed to the pastors and clerks of the parishes or congregations, continued 
until 1936 [24, 25, 26, 27]. Variations in the basis of reporting, or of definitions 
of membership, are recorded in the volumes, and also in Historical Statistics 
[21]. Students of church statistics would probably regard that of 1926 as the 
most complete or satisfactory. A Census of Religious Bodies was begun in 
1946 but after the work was in process Congress declined to make the appropri- 
ation necessary for completing the process. In 1956, the Administration made 
no recommendation for an appropriation, and apparently no member of Con- 
gress was interested in raising a question concerning the matter. Widespread 
indifference in the local churches is also noted in the brief history cited [95]. 

An informing summary of the 1926 Census of Religious Bodies appeared in 
Fry’s book, The U. S. Looks at Its Churches [66]. 


4. SURVEY OF RELIGION OF AMERICAN CIVILIANS 


For some decades there has been an interest, widely in Roman Catholic 
circles and to considerably less an extent among Protestant officials, in the 
inclusion of a question on religion in the decennial Census of Population. In 
1956 the Bureau of the Census began a consultation among many agencies con- 
cerning the inclusion of a question, “What is your religion?” in the forthcoming 
population census of 1960, as part of the usual inquiries prior to the decennial 
enterprise. The question was also asked in a few localities, and again in March, 
1957, of a sample of persons over 14 years of age in 35,000 households in all 
parts of the nation. (The results of the latter survey are noted below.) It was 
indicated by Census officials that if the question were used in 1960, it would 
probably be asked only of a sample of 20 per cent of the households enumerated 
[101]. 
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Considerable discussion of the proposal went on in church circles and else- 
where, during 1957. As summarized in Information Service [96] Roman Catho- 
lic officials and press were, with one exception, in favor of the proposal; Jewish 
agencies and press, again with one exception, were opposed; while Protestant 
officials and press were apparently sharply divided. The over-riding considera- 
tion, among those opposed, was apparently that of religious liberty—the 
freedom of the individual in relation to the power of the government. The 
journal edited by Roman Catholic laymen, The Commonweal, New York, re- 
garded the asking of the question as an invasion of privacy—this seems to have 
been the lone Catholic dissenting voice. 

The proposal was favored by those who felt there was some value in learning 
the “religious leanings” of the people, by localities, states, and regions. It was 
opposed by some persons with technical competence in research as being of 
little value for research purposes because the replies would indicate both prefer- 
ence and affiliation, with no distinction between them. 

After careful study of the discussion, the director of the Bureau of the 
Census issued on December 12, 1957, a statement [22], summed up in the 
heading “1960 Census Will Not Ask Question on Religion.” The primary rea- 
son, Robert W. Burgess said, “was recognition that at this time a considerable 
number of persons would be reluctant to answer such a question in the Census 
where a reply is mandatory. Under the circumstances it was not believed that 
the value of the statistics based on this question would be great enough to justi- 
fy overriding such an attitude. Cost factors were also a consideration.” In the 
same statement Dr. Burgess said that the decision did not preclude inclusion 
of some such question in later Censuses or the publication of information ob- 
tained in voluntary surveys. He said that one survey of this type including 
persons over 14 years of age in 35,000 households would be published. 

On February 2, 1958, appeared the results of this inquiry, “Religion Re- 
ported by the Civilian Population of the United States: March, 1957” [23]. 
For example, two out of every three persons 14 years of age and over regarded 
themselves as Protestant and one out of every four as Roman Catholic. The 
children under 14 years in these households were also enumerated. More 
women than men were reported for the major religious groups. Figures were 
also presented by color, region of residence, urban and rural residence and age. 
It was stated that the results are not comparable with the official reports of 
membership made by the various religious bodies, because the latter refer only 
to formal affiliation. Ninety-six per cent of the respondents reported a religion, 
3 per cent stated that they had no religion, and 1 per cent “made no report on 
religion.” 


5. A CHURCH DISTRIBUTION STUDY 


A total of 80 bulletins, in five series, was published between September, 
1956, and early 1958, reporting the results of a study carried on in cooperation 
with 114 religious bodies by the Bureau of Research and Survey, National 
Council of Churches [29]. The general title is Churches and Church Membership 
in the United States; the subtitle is “An Enumeration and Analysis by Counties, 
States and Regions.” The study was supported by a grant from a foundation 
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and was authorized by the Division of Home Missions of the National Council 
of Churches. 

The study is an effort to gather the statistics of churches and church mem- 
bership for the year 1952 and to relate these findings to certain data in the 
1950 United States Census of Population. All of the 251 religious bodies listed 
in the 1953 Yearbook of American Churches were invited to participate. One 
hundred and fourteen bodies cooperated. While most of the 137 religious bodies 
not participating are relatively small, some are large. Among the large bodies 
not included are the large Negro denominations, the Churches of Christ, and 
the Church of Christ, Scientist. The total number of members reported was 
equal to over 49 per cent of the total population of the United States in 1950. 

A listing of Bulletins follows: 

Series A. Major Faiths by Regions, Divisions, and States. (4 bulletins.) 

Series B. Denominational Statistics by Regions, Divisions, and States. 
(8 bulletins.) 

Series C. Denominational Statistics by States and Counties—the basic 
data. (59 bulletins.) 

Series D. Denominational Statistics by Standard Metropolitan Areas. (6 
bulletins.) 

Series E. Analyses of Socio-Economic Characteristics. (3 bulletins.) 


The entire plan is fully described in Series A. No. 1. 

Protestant strength is greatest in the South, and least in the Northeast, while 
the Roman Catholic situation is the reverse. Wide variations occur among the 
states when church membership is compared with population. Among other 
things the study revealed marked differences between the “metropolitan and 
non-metropolitan worlds.” In the metropolitan population were 57 per cent 
of the people, according to the 1950 Census of Population, and here were found 
46 per cent of the Protestant church membership and 75 per cent of the Roman 
Catholic membership. In the non-metropolitan areas were 43 per cent of the 
U. 8. population, 54 per cent of the Protestant church membership, and 25 per 
cent of the Roman Catholic membership. 

Of series E, the final three bulletins on “Socio-economic characteristics,” it 
is stated in the Introduction, “the Bureau staff came to the conclusion that the 
per cent of urbanization of a county was the most important statistic to analyze 
in connection with the churches located in that county and their membership. 
Many other characteristics, in particular median school years completed and 
median income of the population, are closely related to degrees of urbanization, 
but the most important point was that it applies directly both to the county 
and the church members themselves. The membership of an individual church 
may be older or younger, richer or poorer than the county median, but a church 
in a metropolitan or rural county can be presumed to draw its membership 
from that same county.” In Series E, “fifteen major Protestant denominations 
having memberships of over 500,000 have been analyzed, as well as the major 
faiths.” 


6. SOCIAL STUDIES RELATED TO RELIGIOUS AFFILIATION 


In the comprehensive work already mentioned, The Protestant Church as a 
Social Institution [53], Douglass and Brunner in 1935 summarized 48 research 


STATISTICS OF RELIGIOUS AFFILIATION 343 


projects published in 78 volumes, stating that all of the projects “may more 
or less aptly be described as social studies.” The projects were all financed by 
the Institute of Social and Religious Research, New York, which also published 
most of the studies. Douglass and Brunner note that social scientists com- 
plained because “statistics exist on so few points.” The statistics of church 
growth were often challenged, they said, also remarking that there were ob- 
servers who said that “the church had little to do with religion.” Social scien- 
tists were not prone, it was stated, to regard size of an institution alone as a 
criterion ; they were accustomed to search for refined techniques for appraisal of 
institutions. “Ecclesiastics are nevertheless very sensitive to membership gains 
and losses, while the common man unhesitatingly judges the church in large 
measure by its institutional size.” 

Reviewing studies carried on over a 15-year period, Douglass and Brunner 
concluded that “no one of the well-established denominations appears to grow 
faster [in cities] than another because of superior ability or quality of church 
life or method as such. . . . The recent growth of cities has been chiefly due to 
rural migration. Denominations which have large rural constituencies naturally 
grow faster than those which do not. Denominations grow also by conservation 
of members. ... The great ecclesiastical families still continue to dominate 
the religious situation numerically in much the same proportion from decade 
to decade. 

“As between the major faiths, the Protestant, Roman Catholic, and Jewish, 
relatively little direct transfer of adherents takes place.” 

But the data from 78 volumes were “too limited to justify any really funda- 
mental conclusions.” The authors only “suggest certain preliminary insights 
and raise certain problems.” Dependable inferences must await attention to 
“more aspects of church life.” 

In American Society: Urban and Rural Patterns [19], Brunner and Hallen- 
beck devote a chapter to “Organizations of Religion” in which statistics of 
religious affiliation are summarized. The churches are described as “local insti- 
tutions,” and in the United States local situations vary greatly. “There are 
fewer churches per 1,000 people in urban than in rural areas. . . . The density 
of population is so much higher in cities. . . . ” There are “many church closings 
and many new organizations.” Instances of the changing fortunes of local 
churches in both rural and urban areas are cited. The chapter is based in large 
part on Hallenbeck’s American Urban Communities [78], and Kolb and Brun- 
ner’s A Study of Rural Society [114]. 

In Growth of a Science [17] a history or rural sociological research in the 
United States, Brunner records that 16 studies of the rural church had been 
completed in 12 states by the year 1916. He writes that “perhaps no other 
rural social institution has been as much studied as the church.” Most studies, 
however, have been “descriptive one-dimensional social photographs of specific 
situations at a moment of time.” Among generalizations supported by numer- 
ous studies are these: “In a community with declining population church mem- 
bership drops more rapidly than the population, whereas in a growing com- 
munity its rate of gain lags behind that of the community.” “The level of sup- 
port of both open country and village churches is positively correlated with 
such indices as the value of farms, farm income, and per capita retail sales.” 
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Brunner reviews a representative group of studies of rural churches by rural 
social scientists, many of them on the faculties of the state agricultural colleges. 

Morse and Brunner in The Town and Country Church in the United States 
[135] summed up data from 175 counties gathered in the early 1920’s as part of 
the studies of the Interchurch World Movement and the Institute of Social and 
Religious Research. Douglass presented the main findings of studies of 1,231 
rural churches during the period, 1941-49, in Some Protestant Churches in Rural 
America [49], noting an average membership of 186 persons, 40 per cent of the 
churches declining in membership and “extraordinary contrasts. ..in the 
realm of the rural church.” In Douglass’ study, “Some Iowa Rural Churches” 
[48], he arranged local churches into three groups according to the environ- 
mental situation. 

Among numerous recent studies relating to the rural church in particular 
areas, the following are listed: Anderson [4], Coughenour and Hepple [35, 85], 
Garnett [71], Hooker, [87, 88], Hypes [92], Whitman and Lively [209], 
Vidich and Bensman [195]. Maloney and Perry [130] considered both rural, 
and urban churches in two Illinois counties. Tripp [191] emphasized the special 
problems of “the marginal rural church” in the Congregational Christian 
Churches. 

1,000 City Churches by Douglass [45] summarized primary data from sched- 
ules for 1,044 local Protestant churches, mainly in cities of over 100,000 popu- 
lation in 1922. The institutions were believed to be representative of the 
“recognized and well-established” denominations. The churches were classified 
into numerous types. In Douglass’ The Church in the Changing City [43], 26 
“case studies illustrating adaptation” were made in an intensive basis and sta- 
tistics for each of the 26 local churches were presented in considerable detail. 
Certain of Douglass’ generalizations regarding city churches are considered by 
Chapin in Contemporary American Institutions [32]. 

Twenty-six studies of churches in 23 metropolitan areas, made between 1940 
and 1950, were summaried by Douglass in Some Protestant Churches in Urban 
America [50]. They included 5,744 local churches with a membership of 
2,331,555, and included urban, suburban, and adjacent rural churches, with 
the urban most numerous. 

Among the many studies of specific churches in urban communities those 
here listed are by the Lynds [128, 129], Douglass [47, 51], Barry [8], Berry 
[12], Blackwell [14], Lee [121], Perry [148], Ruoss and Odell [126], Salisbury 
[163], Sanderson [164, 165], Shippey [170], Stotts [179], Trimble [190], 
Villaume [196]. 

Among studies of suburban church situations, which are apparently less 
numerous than the rural or urban, the following have been listed: Douglass 
[52], Francis and others [65], Perry [147], Thorne [188], Trimble [189], 
Villaume [197]. 

Cooperative church activities are here noted by titles by Douglass [47, 46], 
and Hallenbeck [79, 80]. 

Sklare’s book of extensive readings, “he Jews: Social Patterns of an American 
Group [173], includes a section on “The Jewish Religion” which contains a 
number of papers on issues of importance in the Jewish community. Among 
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studies of specific urban Jewish communities which have reference to or impli- 
cations for Judaism are the following by Antonovsky [7], Bigman [13], 
Massarik [131], O’Brien [142], Polsky, H. W. [150], Sklare and Vosk [174], 
Sutker [182]. The suburban situations with specific reference to Jewish com- 
munities have been studied by Gans [69, 70], Gersh [72], Glustrom [76], 
Glazer [74]. Two studies considering especially the relation of the individual 
to the community are those by Kaplan [108] and Rossman [161]. 

Comprehensive studies of Roman Catholic parishes have been provided by 
Fichter [59, 60, 61, 63]. In Social Relations in the Urban Parish [60] he in- 
cluded data on 21,754 white baptized Roman Catholics. Urban mobility and 
the complexity of community relationships affect parish functions. Urban 
American life is something that the theologians and moral teachers of the past 
never experienced. Thus new difficulties confront both the priest in charge and 
the families of the parish. Many members become preoccupied with economic 
problems and varied competing interests. Thirty-eight per cent of the members 
included were classified as “dormant.” For married persons, those in their 
thirties showed the lowest point of formal religious participation. Fichter quotes 
“experienced priests” as saying that Roman Catholic teaching against birth 
prevention is a factor discouraging religious observances. The Church de- 
clares that the member practicing “artificial birth control” may not receive the 
sacraments. But economic pressures constantly indicate to a married couple 
that two or three children “are enough.” 

Nuesse and Harte have edited a symposium on The Sociology of the Parish 
[141]. Specific parishes or areas are studied by Coogan [34], Donovan [41], 
Kelly [109], Schuyler [167]. 

Religious affiliation in relation to class has engaged the attention of many 
scholars. Moberg in a bibliographic essay, Social Class and the Churches [134], 
sums up “major generalizations” based upon a study of 72 references that he 
lists. His titles include a number that deal with the problems of defining social 
classes. He considers social classes and church membership and church par- 
ticipation, noting in a few instances “contradictory evidence.” His references 
include both rural and urban studies. In generai “church membership and 
participation have been found to be related to the pattern of social stratifica- 
tion.” 

A comprehensive article by Shippey, “Sociological Forms of Religious Ex- 
pression in Western Christianity” [171] interprets the works of Troeltsch 
[192, 193], H. R. Niebuhr [139, 140], and numerous other writers, some of 
whose works are herein listed. Shippey notes that the subject is receiving atten- 
tion from philosophers, theologians, historians, and sociologists. “Their pub- 
lished monographs and scholarly articles disclose more than a casual interest in 
the possible interrelationship between forms of religious expression and forms 
of social structure.” Another contribution by Shippey considers The City 
Church and Social Class [169]. An early writer on north European experience 
was Max Weber [207], whose theses are constantly considered by modern 
social scientists in the United States. 

Dynes considers “church-sect typology and socio-economic status” in Colum- 
bus, Ohio [55], stating that “significant relationships were found between the 
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acceptance of the sect type of organization and lower socio-economic status and 
between the acceptance of the Church type of organization and higher socio- 
economic status.” Johnson furnishes “a critical appraisal of the church-sect 
typology” [105]. 

Among numerous writings, those of Bendix and Lipset [11], Cantril [30], 
Hunter [91], Kahl [107], and Yinger [213] contain materials for the student 
of class and religion. Various aspects of racial segregation in relation to religion 
are treated by Culver [36], Drake and Cayton [42], Fukuyama [67], Loescher 
[127], Mays and Nicholson [132]. Among works on social class in specific com- 
munities are those by Havinghurst and Morgan [82], Hollingshead [86], 
Warner and others [201, 202, 203, 204,] and West [208]. The last name is a 
pen-name of an anthropologist who studied a rural community in the Middle 
West and reported that each church is made up of a particular class. 

Turning to systematic treatises on the sociology of religion, reference is made 
here to the books of Hoult [89], Kilpatrick [113], Parsons [146], Wach [199], 
Yinger [214]; an article by O’Dea [144], and a booklet by the Fordham Depart- 
ment of Political Philosophy and the Social Sciences [39]. 

Will Herberg’s “sociological essay,” Protestant—Catholic—Jew [83], empha- 
sizes secular elements in American religious institutions and religious elements 
in secular life, a subject also dealt with by Pfautz [149]. Hertzler in his Social 
Institutions [84] writes that “religion is a universal attribute of man at every 
stage of his culture and in every period of history.” Fichter, whose parish studies 
were noted above, has also written a Sociology [62]. Fukuyama discusses Some 
Implications of the Sociology of Knowledge for the Scientific Study of Religion 
[68]. The career of H. Paul Douglass as a “pioneer researchist in the sociology 
of religion” is interpreted by Brunner [18]. 

Acceptance is here given to a statement by W. J. Goode, in a foreword to 
Hoult’s The Sociology of Religion [89]: “The theory of this field remains some- 
what undeveloped. Perhaps the most important body of theoretical analysis 
has come from the social anthropologists, or from those sociologists who have 
analyzed societies from a broad perspective in the manner of anthropology.” 
Accordingly, titles by Boaz [15], Benedict [10], Davis and others [38], Pow- 
dermaker [152], Rivers [159], and Wissler [210], are here included. Wissler 
[210] notes that religious practices are a main heading in the culture scheme 
and that religion is not a dominant characteristic of our culture. 

Attitudes and opinions in relation to religious affiliation have received con- 
siderable attention. Adorno and others in The Authoritarian Personality [1] 
state that “the relationship between prejudice and religion played a relatively 
minor role in our research. ... Religion does not play such a decisive role 
within the frame of mind of most people as it once did . . . ” Stouffer in Com- 
munism, Conformity, and Civil Liberties [180] found churchgoers less tolerant 
than people who did not attend church. Remmers and Radler have sampled 
teenagers’ attitudes [155, 156]. Allensmith’s materials [2] indicated variations 
in social and economic opinions by members of religious groups sampled. Trott 
and Sanderson [194], and Glock and Ringer [75, 158], interpret opinions of 
persons in churches. The latters’ studies were confined to a sample in the 
Protestant Episcopal church, finding differences in opinions between clergy and 
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laity, and among laity the most committed being the “most reluctant to have 
the church enter into and engage in political activity.” Student attitudes and 
opinions are dealt with in the references to studies by Bender [9], applying 
earlier methods described by the Allports [3]; Dudycha, [54]; Kuhlen and 
Arnold [115]; Nelson [137]. Walters [200] studied the religious backgrounds of 
a group of alcoholics. 

Among a great many historical writings there have been selected a group 
thought to be related to the subject of this essay. Schneider (166) presents by a 
combination of methods an informal history of religious institutions in the 
U. §S., 1900-1950. Lerner’s comprehensive book, America as a Civilization 
[122], includes appraisals of religion and religious affiliation. Roman Catholic 
experience is documented in the Catholic Encyclopedia [31]; the histories by 
Curran [37], Ellis [57, 58], Putz [154], Roemer [160]; Witte, on the Catholic 
Church in rural areas [211]. Protestant writers here noted are Latourette [119, 
120], Nicholas [138], Osborn [145], Sweet [183]; and among the Jewish au- 
thors, Glazer [73] and Sklare [172]. Rich interprets what has been called the 
American “rural church movement” [157]. Tawney [185] writes that historical 
explorations of the ambiguous region where religious, ethical, and economic 
interests meet “form a literature extensive and sometimes learned and acute”; 
he reviews a number of titles. His Religion and the Rise of Capitalism [186] is 
here listed, and in The Times Literary Supplement cited [187] are references to 
Tawney’s own contributions and many other aspects of “historical writing.” 

Much that is known about the economic aspects of organized religion, 
which receives annually about 1 per cent of the national income and about one- 
half of the people’s philanthropy, is summed up in the volume by Dewhurst 
and Associates [40]. Andrews [6] in a study of philanthropic giving reports 
that except for the very wealthy the highest rates for giving were among those 
with incomes of less than $3,000 in 1943. Hoyt and others [90] consider growth 
of income among the American people as an ethical issue. Johnson and Acker- 
man [106] review the practices of churches as employers, investors, and 
money raisers. Harris and Ackerman [81] interpret the family farm in relation 
to the town and country church. Pope wrote of the role of churches in the severe 
labor conflicts in Gastonia, N. C. [151], stating that no general theory of social 
scientists or philosophers is adequate to interpret the developments in the local 
churches there and that church leaders usually lack knowledge of social and 
economic affairs. 


7. LIMITATIONS OF RELIGIOUS STATISTICS 


The limitations and defects of church statistics are surely apparent when 
one notes their origins and the processes whereby they are compiled (as has 
here been attempted), and the large role of untrained persons working with 
varied records in numerous local situations. They are also apparent when one 
recalls that there have been, since the Census of Religious Bodies, 1926, no 
adequate compilations made by one agency using uniform methods. The Year- 
book of American Churches [116] simply compiles “the latest information,” some 
of it a number of years old, and states that ordinarily there has been no oppor- 
tunity to verify the information. 
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The frequent estimates, whether by local or national officers, may or may 
not be carefully made; they are apparently not systematically checked by any 
independent person or agency. 

But just as popular statements on the significance of statistics of religious 
affiliation are usually based upon impression rather than documentation, so 
the defects of church statistics are in part based upon casual observation or 
impression. The mobility of the people is causing not only great problems for 
the priest, rabbi, or minister in relation to his people—it may also be adding to 
the statistical problems of overburdened clergy. Many persons may be on the 
rolls of more than one local church. At present there is no way of ascertaining 
the degree of duplication. 

As Douglass and Brunner pointed out [53], statistics of religious affiliation 
could only be well interpreted if they were accompanied by documentation on 
involvement of persons, such as attendance or financial contributions or on 
other evidence of participation. Attendance is seldom recorded in churches. 
The Yearbook of American Churches [116] sums up polls of samples of adults 
made from time to time. It publishes financial data from only about 50 of the 
religious bodies, but there are no reports on the proportion of members that 
contribute. 

Refinements and qualifications of statistics come from trained field workers, 
but in most instances these workers are dependent upon what is told them by 
local clergy or lay leaders, or upon the state of the local church records. Brief 
documentations of the limitations and defects of the world of church statistics 
—one that most social scientists would generally label “crude”—are found in 
two references cited [93, 117]. 


8. NOTES ON DATA FROM OTHER NATIONS 


This essay is, of course, mainly about the United States, but requests for 
figures from other nations are often encountered in the U.S.A., hence a few 
international references are given here. The Demographic Yearbook, 1956, pub- 
lished 1957, [176], includes the returns from the 32 national censuses made 
around 1950 that contained a question of religion. In 20 other nations taking 
censuses at that time a question on religion was not asked. Other relevant data 
from the Statistical Office of the United Nations with respect to these censuses 
will be found in references 175 and 177. A critical review of the state population 
censuses by faiths up to World War II was given by Lindfield [126]. A list of 
the precise questions asked in these censuses and instructions to enumerators 
are found in Information Service [97 |. 

Reports from both state censuses and private compilations for most nations 
are found in World Christian Handbook [77], and the annual Statesman’s Y ear- 
book [178]. The National Catholic Almanac, an annual, [136], lists the Roman 
Catholic population by nations. Price lists recent Christian and missionary 
statistics throughout the world [153]. World Religions [118] lists figures for 
major religions and “families” of Christian denominations throughout the 
world, but with detail for Great Britain, Canada, and the United States. 
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9. DEPOSITORIES OF HISTORICAL MATERIALS 


Sweet [184] listed numerous depositories of church history materials and 
sources in an article in Church History in 1939 and has since revised and en- 
larged the number in annual statements in the Yearbook of American Churches 
[116]. The Yearbook for 1959 contained 83 such depositories, for most large 
religious bodies. Among those listed are American Jewish Archives, Cincinnati 
20, Ohio; Catholic Archives of America, Notre Dame University, South Bend, 
Ind.; and the relatively numerous collections from various Protestant denom- 
inations in the libraries of the University of Chicago, Chicago, Ill.; Union 
Theological Seminary, New York; Yale Divinity School, New Haven; the Mis- 
sionary Research Library, New York; the Bureau of Research and Survey, 
National Council of Churches, New York. Sweet’s list includes sources on the 
following Protestant denominations and families or groups of denominations: 
Adventists, Baptists, Brethren, Congregationalists, Disciples, Episcopal, 
Friends, Latter-day Saints, Lutherans, Mennonites, Methodists, Moravians, 
Presbyterians, Reformed, Unitarians, Universalists. 
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INCREASE IN RENT OF DWELLING UNITS FROM 1940 TO 1950 


Margaret G. Reip* 
University of Chicago 


Average rent paid by nonfarm tenants increased greatly from 1940 
to 1950. This change is described by the respective censuses of housing. 
Increased consumption and increased price both played a part. This 
conclusion is based on evidence presented in this paper. It examines 
variation among places in increase in rent paid in relation to change in 
the quantity and quality of housing, change in the price of housing as 
measured by the rent component of the Consumer Price Index of the 
U. S. Bureau of Labor Statistics and by the importance of new con- 
struction entering the market in the late forties which was in large by 
measure free from rent control. 

The analysis presented in this paper indicates that the price of rental 
housing as measured by the price index was a highly significant factor 
explaining the change in average rent paid from 1940 to 1950. So also 
was change in stock. It was of two types: (1) units added by construc- 
tion during the forties, and (2) change during the forties in the tenant 
stock of units in structures in existence in 1940. In addition, difference 
in the treatment of rent-free units in the two censuses had an important 
bearing on the rent change observed for rural nonfarm places. These 
factors account for a large part of the variation among places in the 
increase in rent paid between 1940 and 1950. The relationships observed 
indicate that the rent index was a fairly reliable measure of change in 
the monthly rent of dwellings of specified quality when allowance was 
made for the new-unit bias. 


ENSUS data indicate wide differences in rate of rent increase from 1940 to 

1950 in various places. This variation presents a challenge to any investi- 
gator curious about factors leading to diversity. Differences by type of com- 
munity are especially striking. For example, mean contract rent of 1950 per 
$100 of such rent of 1940 by type of community was as follows:! 


All nonfarm $157 
Metro areas? 142 
Urban places* 147 
Rural nonfarm‘ 197 


* I wish to express my appreciation to the many persons who suggested ways of analysing the relationships 
examined and of presenting the findings of this article. I am especially indebted to Albert Rees. 

1 These relatives are based on my estimates. They assume that midpoints represent means of intervals, and 
that units renting for $100 or more had mean rent of $110. The Bureau of the Census reported mean rent for 1940. 
For all nonfarm dwellings it was five per cent less than my estimates. Since mean rent is not reported for 1950, it 
has seemed best to use means for 1940 computed in the same manner as those for 1950: The relatively high estimate 
of mean rent derived from the use of midpoints occurs because rents tend to cluster at the lower edge of the intervals 
used. Errors caused by this clustering seem likely to be of little consequence in an analysis concerned solely with 
relative rents. Data shown here are from Census of Housing: 1940, Vol. II, Part 1, Tables 14, 18, and 103; and Census 
of Housing: 1950; Vol. II, Chapter 1, Tables 14 and 26. 

? Cities and contiguous areas with a population of 50,000 or more were described in the 1940 census as metro- 
velitan districts. Population growth from 1940 to 1950 increased the number of such population groupings. In the 
1950 census they were described as Standard Metropolitan Areas. These groupings of the two censuses are referred 
to as metro areas. The boundaries of many of them changed slightly between the censuses. In spite of such changes 
and the inclusion of small cities and rural nonfarm areas within metro areas the change between 1940 and 1950 
in rent of metro areas approximates the change experienced by large cities in general. Small cities and rural nonfarm 
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At the same time the rent index of the U. 8. Bureau of Labor Statistics, the 
chief measure available on the price of rental housing, was 117.6 in March 1950, 
with index of March 1940 equal to 100. This index relates to large cities. Hence 
rents paid in metro areas would be more likely to reflect its influence than those 
of other places. Even in metro areas the increase in rent paid was appreciably 
greater than that of the rent index. The percentage by which increase in rent 
paid exceeded increase in the rent index, by type of community, is as follows: 


All nonfarm 
Metro areas 
Urban places 
Rural nonfarm areas 


The greater rise in average rent paid than in the rent index could occur be- 
cause (1) the rent index understated the rise in the price of rental housing, and 
(2) the quality and quantity of housing and housing services provided by rent 
had increased. This article examines change in rent paid in an effort to isolate 
the effect of these two conditions. This has been done through examining con- 
ditions related to variation in increase in rent paid among cities and states as 
reported in census data.® 

The analysis indicates that rise in rents registered in the rent index explains 
about one-third of the variation among 30 cities in relative rent paid in 1950. 
Furthermore in these cities an increase of 10 per cent in the rent index was 
accompanied by an average increase of 13 per cent in the average rent paid. The 
greater average rise in rent paid than in the rent index appears to be related to 
change in stock—some of this undoubtedly being the effect of rent control. 
Increase in rent paid is positively related to degree of improvement in quality 
of tenant units, and also to increase in the importance of utilities provided by 
the rent. Such changes caused rent paid to rise faster than the rent index. In 
addition the rent index understated, especially in the late forties the increase in 
price, occasioned by new units entering the market at rent above those of 
equivalent units in the existing stock. This effect was influenced by the fact that 
such construction was minor in cities where the rent index rose little. When 
change in stock is taken into account a rise in the rent index of 10 per cent was 
accompanied by a rise in average rent paid of about 10 per cent. This tendency 
was observed for all types of communities.® 


places within metro areas would tend to be affected by much the same conditions as other parts of the areas, and in 
1950 24 per cent of the rural nonfarm dwelling units were within these areas. (This estimate is based on Census of 
Housing: 1950, Vol. 1, Chapter 1, Table 5 and Vol. II, Chapter 1, Table 4.) 

* Incorporated places and unincorporated with a population of 2,500 or more, plus some other densely populated 
areas on the fringe of large cities. 

4 Nonfarm population other than urban. 

6 Unless otherwise specified the census data used are from the Census of Housing: 1940, Vol II and III, 
and Census of Housing: 1950, Volumes I and II. 

* It has of course been possible to investigate only such change in rent as is registered in thly rents as re- 
ported by the censuses. With rent control certain changes occurred that affected the accuracy of the rent index as a 
measure of a true price index. For consumers acquiring occupancy, bribes of various kinds were common. Rent 
control may have led to a reducticn in maintenance and perhaps a discontinuance of some services. For discussion 
of such aspects of rent see Report of the President's Committee on the Cost of Living, Office of Economic Stabilization, 
Washington: U. S. Gov't Printing Office, 1945, especially pp. 359-64. 
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This paper summarizes findings of other investigators and presents the 
evidence to support the main findings summarized above. The main topics are 
as follows: (1) Findings of other investigators. (2) Census concepts of rent. (3) 
Price change implicit in the rent index. (4) Change in rent paid in large cities. 
(5) Change in rents paid in rural nonfarm areas. (6) Concluding remarks. 


1. FINDINGS OF OTHER INVESTIGATORS 


Divergence during the forties between the rise in average rent paid and the 
rent index has been noted. Certain interpretations differ from those just sum- 
marized. For example Maisel after examining data available in the late forties 
concluded that “the B.L.S. rent index has not measured change in average 
rents because this index deals with movements within a very limited sphere 
which cannot be generalized.’’? He judged that it did not apply to all households 
within large cities nor to small cities or rural nonfarm areas. He noted that the 
stock of dwellings had undergone change, and that addition of units at rents 
above the previous average and removal of old units below the market average 
was an important factor in the rise of average rent. He concluded, however, 
that the rise in rents represented solely price increase, since “the quality of all 
rental housing almost certainly fell between 1940 and 1947.” This conclusion 
was reached even though the census reports providing rent data also provided 
evidence of improved state of repair and increased toilet facilities.* In addition 
it indicated some decline in rooms per dwelling unit. Maisel also quoted opin- 
ions from various sources on the decline in maintenance and other changes 
during the period when rent control was in effect. His method of estimating the 
magnitude of offsetting tendencies implicit in the various changes is not de- 
scribed. 

Humes and Schiro'® of the U. 8. Bureau of Labor Statistics also examined 
the evidence on increase in rent paid from 1940 to 1947, and considered in some 
detail conditions causing average rent paid to increase more than the index. 
They noted that among cities with the greatest increase in average contract 
rent were those which had the highest volume of wartime housing and which 
had experienced the largest shift of units from tenant to owner occupancy. 
They noted that in the city of Seattle 9 per cent of the 1944 rental units had 
come into the market during the period from 1940 to 1947 at average rents 69 
per cent above the city average. Furthermore the rent of these was set in terms 
of rent of units of “equivalent” quality. In addition they reported that average 
rent of units removed from the rental market for owner occupancy, conversion, 


7 Sherman J. Maisel, “Have We Underestimated Increases in Rents and Shelter Expenditures?” Journal of 
Political Economy, (1949), p. 116. 

* Ibid., p. 108. This evidence was discounted. Maisel thought that it might have been the result of change in 
schedule used in reporting (p. 109). He did not note that improvement in tenant units indicated by such data was 
positively related to increase in average rents. See footnote 55 below for such an estimate. 

* In concluding that quality of housing was lower in 1947 than in 1940 Maisel appeared to have been much 
influenced by opinions expressed in the reports on the controversy during the mid-forties related to the accuracy 
of the Consumer Price Index. This controversy related chiefly to the years 1942 to 1944, and much of the improve- 
ment in housing between 1940 and 1947 may have occurred from 1940 to 1942, a period when real income increased 
and residential construction was relatively high. It is also of interest to note that the evidence on decline in quality 
cited by Maisel from Report of the President's Committee on Cost of Living, op. cit., was reached without benefit of the 
data for 1947 available to Maisel which gave evidence of improvement between 1940 and 1947 in quality of tenant 
units in 34 metro areas. 

10 Helen Humes and Bruno Schiro, “The Rent Index,” Monthly Labor Review, (1948), pp. 631-37. 
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demolition, as determined by the sample of the CPI samples, was lower than 
the average rent of the units remaining. Hence the remove! of units for these 
purposes raised the average rent of the remaining units. 

Winnick using the 1940 and 1959 censuses of housing for a set of cities 
examined rise in average rent between 1940 and 1950 for all dwellings and 
for the stock of 1950 built in 1939 or earlier. He judged that “the decade rise in 
median rents of the inventory already standing in 1940 was nearly as much 
as for the entire inventory.” Consequently be concluded that “the sizable in- 
creases in average or median rent over the decade were not primarily accounted 
for by new construction” Using a Spearman coefficient of rank correlation he 
observed what he refers to as a “loose relationship” between change in rent 
paid and in the rent index. On the basis of this he concluded that “the BLS 
index does not serve as a reliable first approximation to the average rent change 
of even the older rental inventory.”” 

Three things! account for much of the difference between my findings and 
those of Maisel and of Winnick: (1) greater reliance on quantitative estimates, 
(2) the greater number of conditions taken into account, and (3) different as- 
sumptions about the form of relationships and suitable statistical techniques. 
My analysis used data not available to Maisel, who wrote in the late forties 
and had less information on the nature of change in stock and fewer observa- 
tions for the study of variation among places than were provided by the 1950 
census. He did, however, have data for 34 metro areas providing some de- 
scription on change in stock. Little use appears to have been made of these." 
My analysis differs from both that of Maisel and of Winnick in that I used data 
for various places to estimate the magnitude of the effect of change in stock. 
Both Maisel and Winnick commented in some detail on types of factors that 
might have been present in various places. Winnick made one estimate of 
relationship among cities, i.e., that between change in rent paid to the rent 
index as indicated by a Spearman coefficient of rank correlation, whereas I 
assumed that the interrelationship of the variables was proportional, and 
utilized coefficients of regression and of determination of change in stock and 
the index in relation to average rent. 


2. CENSUS CONCEPTS OF RENT 
Contract rent is defined by the Census of Housing of 1950 as follows:'* 


“Contract monthly rent is the rent at the time of the enumeration contracted for 
by the renter, regardless of whether it includes furniture, heating fuel, electricity, 
cooking fuel, water or other services sometimes supplied.” 


1 Louis Winnick, American Housing and Its Use, New York: John Wiley & Sons, Inc., 1957, p. 112. 

12 Jbid., p. 114. Winnick appears to have relied on Maisel's findings even though the relation of change in rent 
paid to the rent index was much more obvious by 1950 than it was in 1947. See footnote 36 for further comment 
on Winnick’s findings. 

13 Additional differences occurred: I used mean rents whereas Winnick used medians. The means show slightly 
less increase in rent than the medians, but the degree of associations of the variables was little affected by the type 
of average used. In addition I used the rent index of March of the respective years, whereas Winnick used the annual 
rent indexes. This difference had a minor effect on the relations. 

4 The data are from U.S. Bureau of the Census, Current Population Reporis, Series P-71, nos. 1-35, released in 
1947. See footnote 55 for estimates between variables related to change in rent paid as indicated by these data. 

4% Almost all variables are expressed in log form. This practice was adopted after observing that linearity tended 
to be increased with variables in this form. Variables expressed in the log form are designated by subscripts of lower 
case letters, and those in arithmetic form by subscripts of upper case letters. 

1% Census of Housing: 1950, Vol. I, Chapter I, page xv. This article is confined to an examination of contract 
rents. It is, however, of interest to note that the censuses of 1940 and 1950 also report gross rent. This is “contract 
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Thus rent as here defined provides more than space in structures. 

In 1950 no dollar estimates were made for rent-free tenant units. Such es- 
timates were, however, made in 1940, and were combined!’ with reports on 
contract rent paid. In commenting on this difference in treatment the U. S. 
Bureau of the Census stated: 

“The contract monthly rent data for the renter-occupied nonfarm units are con- 


sidered comparable for the 1950 and 1940 censuses, although in 1950 no dollar esti- 
mates were made for the rent-free units.” 


Evidence for this judgment was not cited. The small percentage of rent-free 
units in urban places in 1950 provides some grounds for assuming that their 
inclusion in 1940 and not in 1950 can be ignored. Rent-free units were, however, 
important in rural nonfarm areas; and the treatment of rent-free units seems to 
have had an important effect on rent change observed. Evidence bearing on this 
effect is considered later.'® 

Throughout this article rent paid refers to contract rent, even though in 
1940 the imputed rent of rent-free units is included. Furthermore all averages 
are my estimates of means.'® 


3. PRICE CHANGE IMPLICIT IN THE RENT INDEX? 


The Consumer Price Index of the U. 8. Bureau of Labor Statistics presumes 
to measure relative price, between periods, of an identical bundle of consumer 
products and services typical of those used by families of wage-earners and 
lower-salaried clerical workers, during a base period. The universe of the rent 
index” differs slightly from that of the other components of the CPI, in that 
since 1942 it has related to a random sample of tenant dwelling units. This 
coverage plus the fact that coverage prior to 1942 was broad makes it seem 
likely that the rent index from 1940 to 1950, if unbiased, described the change 
in rents of units continually in the rental stock and not undergoing important 
change. The index of the years 1940 to 1950 was confined to large cities. It will 
represent the price of rental housing in small cities and rural nonfarm areas 
only if all types of nonfarm communities are experiencing common change. 
The findings of this article indicate that common elements were important. 

There is the further question of comparability of the housing priced from 
time to time. In other words was the quality and quantity of housing priced 
identical on the various periods? Change in rent is estimated by following rents 


monthly rent plus the reported average monthly cost of utilities (water, electricity, gas) and fuels such as wood, coal 
and oil, if these items are paid for by the renter in addition to contract monthly rent. If furniture was included in 
contract rent, the reported estimated rent of the dwelling without furniture was used in the computation rather than 
contract rent.” 

Difference between contract and gross rent varies with type of structure. The greater the number of units in a 
structure the more likely are utilities to be included in the rent, and the more nearly alike are contract and gross 
rent. Marked differences tend to occur between contract and gross rent of 1-unit structures. 

Change in type of structures in tenant occupancy tends to reduce the comparability of contract rents at two 
points of time. Because of this, gross rent may reveal basic tendencies better than contract rent. However, gross 
rent is far from perfect as a measure of price of identical housing. It is likely to have more reporting errors than 
contract rent; and when consumption of utilities is rising it will be more influenced by increased consumption of 
these than will contract rent. 

17 All data for 1940 include the imputed rent of rent-free units, 

18 See footnote 57. 

19 See footnote 1. 

2 For description of this index see Helen Humes and Bruno Schiro, “The Rent Index,” Monthly Labor Review, 
(1949), pp. 60-68. 
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of a selected set of dwellings; minor changes in quality tend to be ignored. When 
capital improvements are made, such as the addition of a room, dwellings so 
improved are dropped from the sample or re-introduced by linking. In general 
as a given set of units gets older it is likely on the average to depreciate some- 
what, even if maintenance is at a high level. The rent index will then understate 
the true price index, unless offsetting conditions are present. 

One offsetting condition seems likely to have been present, namely, increased 
consumption of utilities. The contract rent of a considerable percentage of 
dwellings includes some utilities, the cost of which is influenced by price arid 
volume. Volume has probably been increasing. Families in general have been 
increasing the size of their refrigerators and the use of electrical equipment in 
their kitchens and dwellings in general. The upward trend in such use was prob- 
ably important during the forties. Increase in rent to cover the increased con- 
sumption of electricity or other utilities provided by contract rent would in- 
crease both contract rent and the rent index, but such a rise relates to consump- 
tion and not to the price. 

Partial or complete absence of rent control for the construction of the late 
forties affected the accuracy of the rent index. During these years dwelling 
units entered the market with rents above those of equivalent quality under 
rent control. This higher rent did not affect the rent index. since it reflects only 
that rent change occurring after units are in the sample. 

The Bureau of Labor Statistics has referred to this underestimate as the new- 
unit bias, and has provided estimates of its probable magnitude. The one per- 


taining to January 1950 led the Bureau of Labor Statistics to conclude that the 
rent index of the 34 cities combined would have been 5.5 per cent”! higher had it 
reflected rents of new units of equivalent quality. The 1950 Census of Housing 
gives some indication that this estimate overstated the new-unit bias. For 
example in metro” areas in general mean rent in 1950 of all units exceeded by 
4.8 per cent that of units in structures remaining from 1940* and some of this 
higher rent seems likely to have been the result of higher quality.* The BLS 


2% “Correction for New Unit Bias in the Rent Component of CPI,” Monthly Labor Review, (1951), p. 442. 

® The data used by the BLS in making its estimate of the new-unit bias were confined to the main city or urban- 
ized portion of 34 metro areas. The urbanized area of Atlanta, Ga., for example, included 76 per cent of the dwelling 
units of the metro area, and that of Seattle, Wash., included 84 per cent. The new construction was higher in metro 
areas than in urbanized areas. This difference implies that the BLS would have observed an even greater new-unit 
bias had the survey covered metro areas in general. This likelihood provides further reason to suspect that the 
new-unit bias reported was an overstatement. 

% The estimate of the new-unit bias made by the Bureau of Labor Statistics included more than new construc- 
tion. Units added to the stock through conversions were classed as “new,” even if they were in structures in the 
stock of 1940. The importance of “new” tenant units as reported by the Bureau of Labor Statistics and the impor- 
tance of tenant units built in the forties as reported in the 1950 census are highly correlated among the various 
cities. The difference in these percentages indicates that tenant stock of 1950 of the index cities had on the average 
about 16 units from conversion during the forties for every 100 units newly constructed. This is only a crude es- 
timate. Unfortunately the BLS did not report the number of converted units in its set of new units nor their 
average rent. 

% Indexes of characteristics in 1950 for metro areas in general, (a) of tenant units built during the forties and 
(b) of all units, with those remaining from 1940 equal to 100, are as follows: 


Tenant units built 


All tenant units 


Contract rent 132.6 104.4 
Index of importance of quality-A units 127 103 

Mean rooms per unit 95.6 99.5 
Number of units 15.5 115.5 
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in making its estimate of the new-unit bias stratified units by certain structural 
and other characteristics; thus some standardization for quality occurred. Age 
of dwelling appears to have been ignored,” so that rent of a unit built in 1910 
might have been compared with that of a unit built in 1942. Crudeness of such 
standardization may well have resulted in an overstatement of the new-unit 
bias. This likelihood made it seem best in examining conditions related to 
change in rents between 1940 and 1950 to accept the unadjusted rent index as 
the best available estimate of the price of rental housing and to control where 
feasible on change in stock. 


4. CHANGE IN RENT PAID IN LARGE CITIES” 


Among 30 cities average rent paid in 1950 per $100 of rent paid in 1940 ranged 
from $117 in New York, N. Y., to $205 in Jacksonville, Fla.; and the esti- 
mated rent indexes of March 1950”? of these cities, with March 1940 equal to 
100, were 106 and 138, respectively. The mean relative rent paid of 1950 re- 
ported for these 30 cities was $153, whereas the mean rent index was 120. 
Thus in these cities the mean relative rent paid exceeded the mean rent index 
by 27 per cent.”8 

Among the 30 cities the relative rent paid in 1950 and the rent index are posi- 
tively related. The correlation is as follows: 


Bou 4717 1.2753X ;. r? = (1) 


where X,, is rent paid in 1950 for all rented dwellings in the respective cities 
per $100 of rent paid in 1940; and X;, is the rent index of 1950, with 1940 equal 
to 100, both variables being in log form. Thus an increase of 10 per cent in the 
rent index was accompanied by an increase of 13 per cent in average rent paid, 
and the rent index explains 31 per cent of the variation among the cities in 
percentage increase in rent paid. This correlation is highly significant, i.e., at 
the .01 level. Furthermore with X ;, equal to 100, i.e. no change in the rent index, 
estimated X,, is 120—indicating that relative rent in 1950 of dwelling units 
exceeded the rent index by 20 per cent. Thus the rent index explains much of 


Thus higher average reat of the new compared to existing units, i.e. 33 per cent, was accompanied by improved 
quality, i.e. 27 per cent, and by a slight decrease in number of rooms per unit, i.e. 4.4 per cent. The volume of new 
construction, i.e. that of the forties, was sufficient to raise the quality index by 3 per cent and to lower the average 
number of rooms by 0.5 per cent. (See footnote 31 for description of the quality index used here.) 

% The degree of maintenance was also ignored, apart from extreme conditions involving dilapidation. (Op. cit., 
pp. 440, 444.) 

% A rent index is reported for 34 cities. Four were excluded from this analysis, i.e., Manchester, N. H., Mobile, 
Ala., Portland, Me., and Savannah, Ga., because data for relevant analyses were unavailable in the 1950 census of 
housing. 

27 Unless otherwise specified all estimates of the rent index relate to March of 1940 and of 1950. The rent index 
of March 1950 was estimating by interpolations based on data for various months reported in the Monthly Labor 
Review. 

8 These estimates relate to unweighted means of the 30 cities. Since the increase in rent paid and in the rent 
index was less for large than small cities these means overstate both the rise in average rent paid and the rent index 
of tenants of these cities. The weighted mean relative rent paid in 1950 of these cities was 136.4 and the mean 
rent index was 116.5, with number of tenant dwellings in 1950 used as weights. Thus the mean relative rent ex- 
ceeded the mean rent index by 17.1 per cent. This increase in average rent paid is very similar to that of metro 
areas in general, and the increase in the rent index is very similar to that of the 34 index cities combined. On this 
basis it seems reasonable to conclude that the experience of these 30 cities is fairly representative of that of large 
cities in general. 
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the variation among cities in degree of rent change but falls far short of ex- 
plaining increase in average rent paid. 

In spite of a significant correlation between the two types of rent change and 
the close correspondence in average rate of change, the scatter in Figure 366, 
Panel A shows evidence of two or more systematic variables. This is indicated 
by the lack of symmetry in the residuals, e.g. by the tendency of relative rent 
‘paid to be above the regression line at high levels of the rent index and below at 
low levels. 

This characteristic is associated with the importance of new construction. 
The correlation of the variables is as follows: 


Xree = — .6297 + 1.3083X;. + .09946X,. R? = .66 


where X,, and X;, are the variables of equation (1) and X,, the percentage of 
tenant-units of 1950 that were constructed during the forties, all variables 
being in log form. X;, and X,, explain 66 per cent of the variation in X,, and a 
10 per cent increase in the rent index was accompanied by a 13 per cent in- 
crease in rents paid. Furthermore equation (2) indicates that if X;. is 100, i.e. 
no change in the rent index and X,, equals 1 per cent, then estimated X,, 
is $97. In other words average rent paid in 1950 would have been 3 per cent 
less than in 1940. 

This correlation supports the conclusion that for these 30 cities the rent in- 
dex and the importance of new construction provide a good basis for predicting 
change in average rent between 1940 and 1950. Since other evidence makes it 
improbable that new construction by itself had the effect ascribed to it by equa- 
tion (2), there is the further question of what conditions affecting average rent 
were associated with X,,. This question is now to be considered by examining, for 
the 30 cities first variation in rents in 1950 of units constructed during the 
forties, and second variation in relative rent in 1950 of units in structures re- 
maining from 1940. 

Construction of the Forties and Relative Rent Paid in 1950, 30 Cities. For the 
30 cities the rent of all units in 1950 exceeded by 4.6?° per cent the rent of units 
of the stock remaining from 1940. For example, mean contract rent of all units 
in 1950 in Scranton, Pa. exceeded by only 0.7 per cent the rent of units re- 
maining for 1940. For Norfolk, Va. the corresponding percentage is 15.5 per 
cent. This represents the effect on average rent of the construction of the 
forties—referred to as new construction. It varies greatly among cities. 

Five variables*® explain 82 per cent of the variation among the 30 cities in 
the relative rent of all units per $100 of rent in 1950 of units in the stock re- 
maining from 1940. The correlation is as follows: 


29 The mean for 30 cities, each city a weight of one. 

3° Relative rent of all units compared to that of tenant stock remaining from 1940 is affected by two main 
conditions: (1) the importance of units built during the forties, and (2) rent of units constructed during the forties 
compared to that of stock remaining. In equation (3) variables Xnc and Xpc take condition (1) into account, and 
other variables relate to condition (2). Variation among places in mean rent of new compared to other units was 
separately examined. Variables similar to Xic, Xgc and Xpc explain about two-thirds of the variation among cities 
in mean rents of new units compared to those remaining for 1940. Their effect is similar to that indicated in equation 
(3) except X;- is a more dominant factor. 


(2) 
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Xice = .8403 — .1600X + .3351Xq. + .3840X,- + .02594X 
+ .0001852Xpo R? = 82 (3) 


where X,, is rent in 1950 of all units per $100 of average rent in 1950 of units 
remaining from 1940, X;. is the rent index, X,- is an index of relative quality,” 
X,c is an index of rooms per unit, X,- is the number of all units per 100 of 
units remaining from 1940" and X pc is the percentage of new construction built 
in 1945 or later, with all variables except X pc expressed in log form. 

The sigus are consistent with expectations. For example, those indicating 
improved quality are positive, i.e. X,.*4 and X,., There is thus some reason to 
expect that increase between 1940 and 1950 in average rent paid in some measure 
represented increased consumption. 

Tenant Stock of 1950 Remaining from 1940, 30 Cities. The exclusion of units 
constructed during the forties lowered slightly* the level of relative rents in 
1950 and reduced slightly the variation of relative rent among cities. Thus rela- 
tive rents in 1950 of units remaining from 1940 compared to those of the all 
tenant units in 1940 much exceeded the rent index. The two sets of relative rents 
rents in relation to the rent index are shown in Figure 366, Panels A and B. 

The correlation between relative rent in 1950 of stock remaining from 1940 
and the rent index of the 30 cities is as follows: 


Xere = — .5084 + 1.2846X;. j (6)% 


where X,, is rent paid in 1950 for units in the stock remaining for 1940 per 
$100 of mean contract rent in 1940 and X;, is the rent index, both variables 
being expressed in log form. Thus an increase of 10 per cent in the rent index 
was accompanied by an increase of 13 per cent in the relative rent of the stock 
remaining; and with the rent index equal to 100 the expected X,, of equation 
(6) is 115,87 in other words, average rent 15 per cent above that of 1940. An 


41 This is based on the importance of units of quality-A, defined here as those not dilapidated and with private 
toilet and bath and hot running water. The index is the ratio of the percentage of all units of this quality in 1950 
with the corresponding percentage for the units remaining from 1940 equal to 100. 

32 This is the ratio of mean rooms per unit in the entire stock per mean number of rooms of units in the remaining 
stock. Means rooms per unit were computed on the assumption that units with 10 or more rooms averaged 12 rooms 
per unit. 

% Its simple correlation with Xh¢ is as follows: 


Xhee = 1.5617 + .2229Xne r? = .34 (4) 


The regression coefficient of Xnc of equation (3) is only .0259, about one-tenth of that shown here for this equation 
where variables describing quality are not held constant. 


% Xoc by itself explains a large part of the variation in Xj¢. The correlation is as follows: 
Xhee = 1.2680 + .3702Xg0 r? = 68 (5) 


X¢qc is directly related to several other variables, for example, Xne, and Xse, which are also directly related to Xhe. 

% The small effect reflects the fact that stock remaining constituted 86.6 per cent of all units of 1950. 

% This equation describes a relationship bet: variables that was judged by Winnick to be “loose.” (Op. cit., 
p. 113.) He based this appraisal on a Spearman coefficient of correlation of .40, and failed to observe the highly 
significant correlation of the variables of equation (6). 

37 Under these conditions equation (1) for all units in 1950 compared to 1940 indicates a relative rent of $120 


Fia. 366. Relative rent paid in 1950 (average rent of 1940 equals $100) in relation to 
the rent index, 30 cities. 


(Data expressed in log form) 
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examination of change in the stock of 1940 in relation to average rents of 1950 
indicates that it accounts for much of the variation in X,, not explained by 
equation (6), as well as some of the tendency for the regression coefficient of 
equation (6) to be somewhat greater than 1.0, and for X,,. to be greater than $100 
when X;, equals 100. See equation (7) below. Numerous changes in stock likely 
to affect average rent appear to have occurred. These are now briefly reviewed 
prior to presenting estimates of their relation to change in rent paid. 

Evidence on Change in Character of 1940 Stock. Much of the data in the 
censuses of housing indicating change over the decade in the 1940 stock of 
tenant units are suggestive rather than definitive. The following changes seem 
likely to have affected average rent paid without affecting the rent index.** 

(1) An upward trend was underway in percentage of tenant units with 
various facilities included in rent. This was indicated for a set of 8 cities by a 
combination of census and other data. In addition there was an upward trend 
in the percentage of units with water, refrigeration and electricity included in 
the rent. These trends are indicated by the census and other data. The mean 
percentages are given in Table 368.*° Two types of change affecting the stock 


TABLE 368. PERCENTAGE OF TENANT UNITS WITH SPECIFIED 
ITEMS INCLUDED IN RENT 


Year Water Refrigeration Electricity Furniture 
1929 54.5 6.4 6.6 6.5 
1933 60.1 11.2 8.4 8.7 
1950 76.4 30.4 22.8 20.8 


remaining from 1940 could have contributed to these changes: (a) The dis- 
appearance from the 1940 tenant stock* of units lacking these facilities. (b) 
An increase in the practice of landlords’ providing these facilities in rent. Evi- 
dence bearing on change of type (a) is now to be considered. 

(2) Between 1940 and 1950 a marked decrease occurred in the importance of 
units in structures of one or two units, and conversely an increase of units in 
structures with three or more units. This probably contributed to the increase 
in the importance of tenant units with various items included in the rent.“ 
Such a likelihood is indicated by data shown in Figure 369. On the horizontal 


38 Two changes are ignored here, namely the transfer of units from farm to non-farm resid and change in 
vacancy status of dwellings. The first of these seems likely to have added appreciably to the rural non-farm stock, 
but to have had little effect within large cities as such. The esoond seems unimportant even though vacancy rates 
were lower in 1950 than in 1940, b ied and d units had much the same average rent in 1940. 

% The basic data come from David L. Wickens, Financial Survey of Urban Housing, U. 8. Dept. of Com. 1937, 
Table 11, Census of Housing: 1940, Vol. III, state reports, various tables, and the Monthly Labor Review, (1954), 
p. 748. These sources report one other type of change, namely the percentage of units with a garage. The mean 
percentages for the cities for the respective years are as follows: 1929, 20.8, 1933, 26.0 and 1950, 18.4 per cent. The 
trend for this characteristic is not continuous as is that of the other four chavscteristics. (These data by cities are 
summarized by Leo Grebler, David M. Blank and Louis Winnick, Capital Formation in Residential Real Estate, 
Princeton: Princeton University Press, 1956, p. 418.) 

4° These data on utilities included in rent relate to all tenant units. Hence some of the trend shown may have 
been the result of a relatively high percentage of units entering the stock during the forties that provided these 
facilities. It seems highly improbable that they account for all the change. 

4 See footnote 16 for comment on utilities in rent by type of structure. 
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axis is the decrease between 1940 and 1950 in the percentage of tenant units in 
structures of one or two units; and on the vertical-axis is the increase between 
1933 and 1950 in the percentage of tenant units with electricity included in 
rent. The observations shown here indicate that the greater the decrease in the 
importance of tenant units in structures of one or two units the greater the 
increase in rental contracts that include the cost of electricity. 

An increase in the percentage of units with electricity in the rent seems likely 
to be associated with a greater rise in the average rent paid than ia the rent 


° 


uw 


Fic. 369. Increase between 
1933 and 1950 in the percent- 
age of tenant units with elec- 
tricity included in rent in 
relation to the decrease be- 
tween 1940 and 1950 in the 
percentage of tenant units in 
structures of one or two units, 
nine cities. 
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index. Thus it seems reasonable to expect that the greater the drop in the 
importance of tenant units in structure of one or two units the more the rise 
in average rent would exceed that of the rent index.” 

(3) In 1950 the number of tenant units remaining from 1940 was in general 
less than the number in the 1940 stock. The units in the 1950 stock remaining 
per 100 units in the 1940 stock had a mean of 88; and the range was from 75 
in Philadelphia, Pa. to 103 in San Francisco, Calif. These ratios will be 
referred to as estimates of the importance of the tenant stock remaining from 
1940. In 27 of the 30 cities the ratio was less than 100, indicating net decline. 
This would occur in cities having a relatively high percentage of units with- 
drawn from residential use because of demolition, a relatively large volume of 
transfer of units from tenant to owner occupancy and relatively few additions 
to or conversions of units in existing structures. In three cities the ratio of the 
stock remaining from 1940 is slightly above 100. These cities probably have had 
few demolitions, few transfers of units from tenant to owner occupancy or 
many additions to or conversions of existing structures. 

The effect on average rent of such change in units in the stock would depend 
on the relative rent of the units lost or added. One such change has already been 


42 This change was probably related to transfer of units from tenant to owner occupancy. 
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noted, i.e. reduction in the importance of units in structures of one or two units, 
and hence the probable increase in the importance of units with electricity and 
other services provided in rents. This change was probably related to the trans- 
fer of units from tenant to owner occupancy. Other conditions relating to such 
transfer may have affected relative rents. 

(4) Change between 1940 and 1950 in the entire stock of units existing in 
1940 differed by year of construction. Disappearance from the stock was greater 
for units built prior to 1920 than those built from 1920 to 1940. Since in 1950 
the average age of dwelling was higher for tenant than owner units, undoubtedly 
a good deal of the loss related to units in tenant occupancy in 1940. If average 
quality and rent tend to decrease with higher age of structure than one would 
expect the greater the net loss of units built in 1919 or earlier the greater the 
increase in average rent between 1940 and 1950. 

If the 1950 census had provided data by year of construction for the two 
tenures, additional insights might have been provided on the changes affecting 
relative rent in 1950. Data on stock in general leave no doubt, however, that 
stock remaining in 1940 differed appreciably from that of 1940, and that the 
degree of change by age of structure differed markedly among cities. 

(5) Some increase occurred in the percentage of tenant units with preferred 
structural facilities. For example in 20 of the 30 cities a higher percentage of 
the tenant units of the stock remaining than in the entire tenant stock of 1940 
had private toilet and bath.“ For the 30 cities the mean percentage of units with 
these facilities was as follows: 


4 In general the number of units in structures constructed in 1919 or earlier was less in 1950 than in 1940, 
indicating for these greater withdrawal of units from stock than addition of units to existing structures by con- 
version or new additions or by transfers from nonresidential uses. (The year of construction reported is presumably 
that of the original structure. Thus if a wing was added during the forties to an apartment dwelling constructed in 
the thirties, the units in the new wing would be correctly included among those built in the thirties.) 

In 21 of the 30 cities a reduction occurred in the number of units built in 1919 or earlier. For the 30 cities the 
number of units in the 1950 stock per 100 units in the 1940 stock by year of construction is as follows: 


Mean Ratio of the Number of Units Range in the Ratio 

Year of Construction in the 1950 Stock per 100 in 1940 Among the 30 cities 
1919 or earlier 96.4 83.6 to 105.7 
1920-1929 113.2 96.5 to 158.2 


(These data are shown in Census of Housing: 1950, Vol. 1, Table 30 and Census of Housing: 1940, Vol. 3, state 
reports, various tables. Similar data are not reported by tenure. It has been assumed that all “no reports” relate 
to dwellings constructed in 1919 or earlier. The percentage of dwellings for which no report by year of construction 
is given is greater in the 1940 than in the 1950 census.) 

Some of the decline indicated for the stock built prior to 1920 may have been the result of an underestimate of 
age of dwellings. It may be that the greater the age the more likely is it to be underestimated. If this tendency 
occurs then some dwellings reported in this category for 1940 would in 1950 have been reported as built in 1920- 
1939. The occurrence of such a tendency would introduce a negative relationship between the ratios of this category 
and those of dwellings built in 1920-1939. Such a relationship was observed. It was, however, significantly at of 
the .5 probability level. 

The addition of units to existing structures including the conversion of existing units appears to have con- 
centrated in structures built during the twenties and thirties. In only one of the 30 cities was the number of units 
in structure built in these years less in 1950 than in 1940. If the additions were new construction, i.e. an addition 
of units to structures erected initially in the twenties or thirties, one would expect they would contribute to higher 
rent in 1950. If the additions were conversions in the period of rent control, one would expect the increase in units 
to raise the average level of rents paid only if the utilities included in rent were above the average level of the 1940 
stock. No systematic information on such difference has been noted. 


« The categories are not strictly comparable between the two censuses. The data for 1940 relate to tenant units 
with private toilet and bath; and for 1950 to units not dilapidated and with private bath and toilet and running water 
and to those dilapidated and with private bath and toilet and hot running water plus an estimate of other units 
with private bath, toilet and bath and cold running water. This estimate was made as follows: (1) the number of 
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Percentage with 
Set of Tenant Units Specified Facilities Relative Level 
Entire stock of 1940 68.5 100.0 
Those of 1950 remaining from 1940 70.0 102.2 


These data indicate some improvement in average quality. It seems highly 
probable that some increase also occurred in the percentage of units in the 
stock remaining that had central heat. Unfortunately the census of housing of 
1950 does not report central heat by year of construction.“ Increase in impor- 
tance of units with preferred structural facilities seems likely to have increased 
average rent paid but not the rent index. Furthermore such change“ in the 
units of the 1940 inventory varied greatly among cities. For example the per- 
centage of tenant units with private toilet and bath increased by 14 per cent in 
Scranton, Pa.,“7 and decreased by 11 per cent in Kansas City, Mo. 

(6) The mean rooms per unit in 1950 of the stock remaining from 1940 was 
less than those of the stock of 1940. Means for the 30 cities are as follows: 


Year Mean Relative Level 
1940 4.00 100.0 
1950 3.64 91.0 


In this respect the tenant units in 1950 of stock remaining were of lesser quality 
or quantity than units of the stock of 1940. On this basis alone one would expect 
rents paid to have risen less than the rent index.** 

Many changes between 1940 and 1950 affecting the tenant stock were inter- 
correlated. For example the decrease in rooms per dwelling and the increase in 
the percentage of units with private bath and toilet were undoubtedly affected 
by the decrease in the importance in the tenant stock of units in structures of 
one or two units. These changes may have had offsetting effects on average 
rent.‘° The increase in the percentage of units with private bath and toilet 


units with private toilet and bath and with cold running water in the “not-dilapidated” category per 100 units in 
this category, exclusive of those with private toilet and bath and hot running water, was computed, (2) this ratio 
was divided by two—an arbitrary decisi and applied to the number of units in the dilapidated category ex- 
clusive of those with private toilet and bath and hot running water. For cities in the North and West the adjustment 
for change in categories reported was unimportant. 

The mean percentage of tenant units in the 30 cities with central heat was 54.5 per cent in 1940 and 58.9 per 
cent in 1950, and increase of 8 per cent which appears somewhat greater than the increase in the importance of units 
with private toilet and bath. The increase was greater in the South than the North. 

The importance of these facilities would tend to decline (1) if a higher percentage of the units lost from the 
1940 tenant stock than the units remaining had these facilities, and (2) if conversion of units with these facilities 
occurred under conditions that lead to sharing of these facilities. On the other hand the relative level would tend 
to rise (1) if a lower percentage of the units lost from the 1940 stock than units remaining had these facilities, and 
(2) if facilities of these types had been added to units in structures remaining from 1940. Only indirect evidence is 
available on the importance of such changes. This is considered later. See equation (7) for example. 

4? This could occur because of additions to existing structures or shift of dwellings without these facilities from 
the tenant to the owner stock. It seems of some interest to note that Scranton experienced some decline in populati 
from 1940 to 1950, and new construction was of minor importance. Yet both owner- and tenant-occupied dwelling 
units appear to have improved in quality. 

48 There is, however, no reason to assume that a nine per cent decrease in average rooms per dwelling would 
cause average rent paid to fall by nine per cent. Other things being equal such as facilities, quality of construction 
and maintenance, and space per room, a unit with 5 rooms seems unlikely to have a rent 25 per cent higher than 
one with four rooms. 

4? It should be noted, however, that there is no reason to assume that rooms in units of structures of one or 
two units can be precisely equated with rooms in multi-family structures. 
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may also have been related to change in the mixture of structure types. This in 
turn may be related to the loss of structures built in 1919 or earlier. The average 
net effect of these changes for the 30 cities may well have resulted in an average 
improvement in the remaining inventory. This is indicated by the evidence 
examined below. 

Effect of Rents of Change in Stock. Variables describing changes in the stock, 
reviewed above are crude. Nevertheless the reat index together with a set of 
three such variables explain 62 per cent of the variation among the 30 cities in 
the relative rent in 1950 of stock remaining from 1940. Furthermore they in- 
dicate that under conditions of little or no change in the rent index or in the 
stock, that little or no change in average rent would have occurred. 

The correlation is as follows: 


Xere = — .6583 + 1.1393X;, — .3319X.. + .06233X,, + .5184X,, 
R? = 62 (7) 


where X,, and X;, are the relative rent and the rent index of equation (6), 
X,. the number of units in the tenant stock remaining from 1940 per 100 units 
in the 1940 stock, X,, the decrease between 1940 and 1950 in the percentage of 
tenant units in structures of one or two units, X,, an index of change in quality 
of units,®° all variables being expressed in log form. 

With three types of change in stock held constant, an increase in the 
rent index of 10 per cent would have been accompanied by an increase in 
average rent of 11 per cent; and under conditions of no increase in the rent 
index and little or no change in stock,®! the estimated average rent paid in 1950 
for stock remaining from 1940 is $98.50 for every $100 paid in 1940. Thus the 
rise in the average rent observed was compounded of both change in stock and 
price. 

Several characteristics of the coefficients of equation (7) seem of interest. 
For example the more nearly the number of units in the stock remaining ap- 
proached or exceeded the number of units in the 1940 stock (X,.) the less was 
the rise in rent. The inference is that units lost from the stock were relatively 
low in rent. If units disappearing were demolished this is reasonable. If they 
were lost to owner occupany a variety of conditions may have contributed to 
the decrease, including an increase in the percentages of units remaining with 
utilities included in the rent. One can also infer from the negative sign of 


5° See footnote 44 for description of this index. 
%! Conditions assumed are as follows: 


Variable Assumption 
Xie 100 no change 
Xac 100 no change 
Xbe 1 decrease of 1 per cent in the importance of units in structures of one or two units 
Xve 100 no change 


82 X;- and Xgc explain 57 per cent of the variation in Xs,. Xac appears to carry much of the effect of other 
variables describing change in stock. Without Xb, and Xyc, the coefficient of Xgc is appreciably higher than that 
shown in equation (7). 

8 This may account for a marked intercorrelation of Xacand Xb. 
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X,. that a tendency of units added by conversion to raise average rent was of 
minor importance compared to the tendency of loss of the units to increase 
average rent of stock remaining. The greater the decrease over the decade in 
the importance of units in structures of one or two units, i.e. the higher X,,, 
the greater was the increase in rent paid. This tendency may be related to the 
greater importance of units with utilities included in the rent that accom- 
panied the decline in the importance of units in structures of one or two units. 
Furthermore the greater X,, the greater may have been the increase in owner 
occupancy among those relatively low in the income distribution. The more 
consumer units of relative low economic level were seeking or being propelled 
into owner occupancy and drawing on the tenant stock, the greater was likely 
to be the loss from the tenant stock of low-quality units. 

Cities with an increase during the forties in the percentage of the 1940 stock 
with private toilet and bath had a greater increase in rent than cities reporting 
a decrease.® This difference may reflect the higher rent because of these facili- 
ties and also a direct relationship between such structural facilities and avail- 
ability of central heating and utilities included in rent. 


5. CHANGE IN RENTS PAID IN RURAL NONFARM AREAS 


The percentage increase in average rent in rural nonfarm areas was much 
greater than that of urban places. One characteristic of the data probably ac- 
counted for much of this difference, i.e. the inclusion in 1940 and omission in 
1950 of estimated rent of rent-free tenant units. In 1950 rent-free units were 


more important in rural nonfarm than other areas and their average quality 
seems likely to have been relatively low.’ This difference in quality would 


% Xp. is of course only a crude index of the change in importance of units with utilities included in rent. 

* This generalization applies as well to the years 1940 to 1947. Rents for 1947 were reported for 20 metro 
areas. Relative rent of 1947 in relation to 1940 had only a slight positive relation to the rent index for 1947 of the 
main cities within the metro areas. However, increase in the importance of units with private toilet and bath and 
decrease in the percentage of tenant units in structures with one or two units explain 48 per cent of the variation 
among the metro areas in the ratio of relative rents to the rent index, and under conditions of no change in stock, 
expected rent paid increased 1.7 per cent more than the index. 

Data from the 1947 survey provided the main evidence used by Maisel in reaching the conclusion that the 
rent index of 1947 greatly understated increase in the price of rental housing. The relationship described here sug- 
gests that the rent index even for the years 1940 to 1947 was a reasonably good index of the monthly price of rental 
housing. (See footnote 14 for source of these data.) 

% The addition of change in rooms per dwelling to equation (7) increased R? little, and had little effect on the 
regression coefficients. Change in rooms per dwelling because of losses from the tenant stock was undoubtedly 
compounded of many things. It would be affected by quality difference related to age of structure and to utilities 
included in rent. In addition reduction in rooms per unit because of conversions probably had a very different 
relation to average rent than the losses. 

57 That rent-free dwellings tend to be of relatively low quality and hence to have had relatively low rent in 
1940 is indicated by data for 1950. The percentage of all tenant units and of specified subsets that were of quality-A 
is as follows: 


Outside Metro Areas 


Urban Rural Nonfarm 


All 58.2 32.2 
Those with contract rent reported f 58.5 36.5 
Contract rent not reported 

(i.e. rent-free units plus the “no reports”) ‘ 55.3 22.3 


These data indicate that in each type of community the rent-free units were of lower average quality than other 
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contribute to an overestimate of rent changes of comparable rents arising from 
the treatment of rent-free units. 

The importance of rent-free units in 1950 accounted for a considerable pro- 
portion of the variation®* between states in the relative rent of 1950 compared 
to 1940.°* The correlation for a set of 36 states® is as follows: 


= 2.0807 + .009698.X r? = 35 (8) 


where X,, is the log of mean contract rent in 1950 per $100 of mean contract 
or estimated rent of rent-free units in 1940 and X; is the percentage of tenant 
units with free rent or rent not reported in 1950. 

The coefficient of determination is highly significant. Furthermore with 
Xp equal to zero, X;n- is 119.7. In other words average rent increased by only 
19.7 per cent over the decade. This is very small compared to that indicated 
by census data. Average rents reported for the 36 states showed an increase of 
113 per cent. It seems highly probable, however, that equation (8) overstates 
the effect on average rent of difference in the treatment of the rent-free units, 
because their importance is correlated with improvement in quality and with 
increase in the price of rental housing. 

Three variables explain 75 per cent of the variation in relative rent among 
24" states. The correlation is as follows: 


Xine = — 1.3493 + 1.0213X,, + .6338X.n + .00789 Xr R? = .75 = (9) 


where X,, is the relative rent of 1950 compared to 1940, X;, is the estimated 
rent index of the city or cities® within the respective states, X,, is an index of 


units. However, only in rural nonfarm areas did this difference affect appreciably the estimate of average quality 
of all dwellings. 

Furthermore there was a tendency for nonwhite households to concentrate in such units. The percentage of all 
units for which no rent was reported, by type of community and by race, for the United States in general in 1950, 
was as follows: 


Type of Nonfarm By 
Community All White Nonwhite 
All 7.5 7.2 9.7 
Urban 4.0 4.0 3.9 
Rural Nonfarm 23.8 21.9 39.4 


Thus about one-quarter of the tenant households in rural nonfarm areas had rent-free units in 1950—about one- 
fifth of the white and close to two-fifths of the nonwhite households. 

(These data are from Census of Housing: 1950, Vol. 2, Chapter 1, Tables 2 and 14. See footnote 31 for definition 
of quality-A units.) 

58 In states for which this variation was examined the range was from $135 in New York to $303 in Florida 
(i.e. mean rent in 1950 per $100 of rent in 1940). 

‘* The importance of rent-free units was probably much the same in 1940 as in 1950. This likelihood is indi- 
eated by the similarity of the percentage of units, by year of construction, in the “no report” category in 1950, i.e. 
29.5 and 22.4 per cent, respectively, for units in the tenant stock remaining from 1940 and for those constructed 
during the forties. 

6° These include the 24 states having one or more cities with a rent-index plus 12 additional states so selected 
that at least 2 states were included from each of the nine census divisions. 

*! These are the states with one or more cities having a rent index. 

® Where a rent index was reported for two or more cities within a state the index used here is mean, each city 
having a weight of one. 
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relative quality of units in 1950 compared to that of 1940° and X, the percent- 
age of units for which no rent was reported in 1950, all variables being ex- 
pressed in log form except Xp. 

This equation indicates that the greater the rise in the rent index of a city 
or the cities within states, the greater the improvement in the quality of tenant 
units and the greater the importance of rent-free units, the greater tended to be 
the rise in rent of rural nonfarm places. It furthermore indicates that a 10 per 
cent rise in the rent index tended to be accompanied by a ten per cent rise in 
the rents in rural nonfarm places. This is very similar to the tendency indicated 
for 30 cities, shown by equation (7). Thus it seems probable that conditions 
affecting rents in rural nonfarm areas and cities within states had much in 
common. 

Difference in treatment of rent-free units in the 1940 and 1950 censuses, 
(Xv), plus the greater improvement® in the tenant units of rural nonfarm (X,) 
than urban areas probably accounts for the greater rise in average rent in rural 
nonfarm than urban places. Under conditions of no increase in the rent index, 
no increase in the percentage of units with private toilet and bath and no 
units rent free in 1950 the expected relative rent (X,,) as indicated by equation 
(9) is $97. In other words rents in 1950 would have been about the same as in 
1940. 


6. CONCLUDING REMARKS 


The evidence in this article on change in average rents of nonfarm dwelling 
units between 1940 and 1950 indicates several things that deserve attention: 
(a) that the rent component of the CPI is a reliable price index for tenant 
dwelling units from 1940 to 1950 of stock remaining from 1940 not only in the 
large cities but for the nonfarm population in general, (b) that by 1950 it never- 


* The categories describing plumbing facilities of units were not strictly comparable in the 1940 and 1950 
censuses. This comparison pertains to percentage of units with private toilet and bath in 1940, and percentage of 
units not dilapidated in 1950 with private toilet and bath and running water and of those dilapidated and with 
private toilet and bath and hot running water. 

% When this variable is expressed in log form the R? is slightly higher than that shown for equation (9). How- 
ever an extrapolation beyond the range of the observations, under the assumption of no difference in the treat- 
ment of rent-free units in 1940 and 1950, indicates that the interrelation of this and other variables at low levels 
is very nonlinear in log form. 

% Estimated percentages of tenant units with private toilet and bath are as follows: 


1940 1950 1950 (with 1940 equal to 100) 
All nonfarm 63.0 69.9 111 
Rural nonfarm 
(Outside of metro areas in 1950) 31.1 36.4 117 


These data are from the Census of Housing: 1940, Vol. 2, Part 1, Table 6 and Census of Housing: 1950, Vol. 2, 
Chapter 1, Tables A-2 and D-2. 

® Data available permit an estimate for urban places similar to that of equation (9) for rural nonfarm areas. 
They have, however, not been examined. This judgment assumes that equation (7) is fairly representative of urban 
places in general. 

One further estimate made supports this conclusion. This used data reported to the Census of Housing: 1950 
for three types of places, i.e., inside metro areas, urban outside metro areas and rural nonfarm outside metro areas, 
for the nine main geographic divisions of the United States. Variables for the 27 observations similar to those in 
equation (3) yielded quite similar regression coefficients and an R? of .83. The rent index used combined those of 
cities located within divisions, using as weights the estimated number of tenant dwellings in respective types of 
communities within the states in which the cities were located for which an index is provided. 
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theless understated appreciably the rise in the price of rental housing in general 
because of the entrance into the market of new units above the level of rents of 
comparable quality, (c) that some of the rise in the average rent paid between 
1940 and 1950 of units continuing from the stock of 1940 appears to be related 
to improvement in quality of rental housing and increase in services included 
in the rent. Demolition and transfer of l-unit structures to owner occupancy 
appears to have contributed to this, and (d) the greater rise between 1940 
and 1950 reported by the census data for the rural nonfarm than the urban 
sector appears to have been largely the result of the omission in 1950 of imputed 
rent of rent-free dwelling. 

The analysis in general including its reversal of findings of earlier investi- 
gators demonstrates the need for careful attention to concept and for sustained 
analysis in order to sort out the effect of various factors. Furthermore it reveals 
that the rental market is very orderly in the sense that tracing the play of 
market forces provides the clues to its structure, and that the rent index is a 
suitable measure of pricc change. 


THE DEMAND FOR FERTILIZER IN 1954: 
AN INTER-STATE STUDY* 


Zvi GRILICHES 
University of Chicago 


This paper relates fertilizer use in different states in 1954 to fertilizer 
prices, the value of output per acre, the hired farm labor wage rate, the 
cash rent paid per acre, and the nitrogen content of soil. The results 
indicate that fertilizer is a substitute for land and a complement to 
labor. The results are also consistent with the previously published 
study of fertilizer time series. An attempt to improve these results 
through the introduction of dynamic considerations proves unsuccess- 
ful. 


1, INTRODUCTION 


HE results of an analysis of aggregate U. S. demand for fertilizer during the 
T period 1911-1956 were presented in a previous paper.' In this paper the 
attention is focused on the state level and on the cross-sectional relationships 
that existed in 1954. The analysis of fertilizer use by states was undertaken for 
two major reasons: First, it is of some intrinsic interest to see whether one can 
explain the wide differences in fertilizer use with the help of a limited number of 
economic variables;? second, I hope to use the results of this study to corrob- 
orate or to contradict some of the findings of the time series study.* 


2. THE MODEL, THE DATA, AND THE VARIABLES‘ 


As in the previous paper, fertilizer use is viewed as a function of the price 
paid for fertilizer, the prices received for farm products, and the prices paid for 
other inputs. Again, rather than measuring fertilizer use by the total weight of 
all fertilizers used, I chose to measure it by the weight of the principal plant 
nutrients contained therein. In one case this is a simple aggregate of total nitro- 
gen, phosphoric acid, and potash used in each state; and in another it is a price 
weighted index of these three nutrients. After some experimentation with al- 
ternative functional forms, a function linear in the logarithms of the variables 
was chosen. At first, the inter-state differences in plant nutrient use are inter- 
preted as representing an adjustment to long run differences in relative prices. 
However, this assumption is modified towards the end of this paper to aliow 
us to distinguish between a “short run” and a “long run” adjustment. 


* This paper is a part of a larger study of changing inputs in U. 8. agriculture financed by a grant from the 
National Science Foundation. 

1 Zvi Griliches, “The demand for fertilizer: an economic interpretation of a technical change,” Journal of Farm 
Economics, 40 (1958), pp. 591-605. 

2 The only other published study of fertilizer use by states is by A. L. Mehring, “Fertilizer nitrogen consump- 
tion,” Industrial and Engineering Chemistry, Vol. 37 (1945), pp. 289-295, in which nitrogen consumption per acre 
was related to crop value per acre, the nitrogen content of soil, and the proportion of total cash income from live- 
stock. It did not include either the price of fertilizer or the prices of other inputs as variables and was limited to the 
consideration of nitrogen use alone. For additional references to previous work in this area see Zvi Griliches, “The 
demand for fertilizer . . . ,” op. cit. 

3 The results of further studies of fertilizer use in the U.S. are given in Zvi Griliches, “Distributed lags, dis- 
aggregation, and regional demand functions for fertilizer,” Journal of Farm Economics, 41 (Feb., 1959), and “The 
demand for inputs in agriculture and a derived supply elasticity,” Journal of Farm Economics, 41 (May, 1959). 

4 For a more detailed discussion of this model and of alternative ways of defining and measuring the various 
variables, see Zvi Griliches, “The demand for fertilizer . . . ,” op. cit. 
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This study uses two somewhat different sources of data. One set of calcula- 
tions is based largely on data derived from the 1954 Census of Agriculture. 
The second set of calculations is based largely on figures derived from various 
reports of the Agricultural Research Service. The basic difference between the 
two is in the definition of and the source for the dependent variable. Con- 
ceptually, they are similar. In both cases I am after a measure of “plant 
nutrients used per acre,” but the measures of “plant nutrients” and “acres” 
differ. Since states are so very different in size some scaling is inescapable. 
A per acre unit though arbitrary is reasonable and convenient. 

Using the Census data, the dependent variable Y was defined as pounds of 
principal plant nutrients used in 1954 per acre of all cropland harvested and 
non-woodland pasture. This figure was computed by multiplying total pounds 
used per fertilized acre by the proportion of all acres fertilized, and by the aver- 
age nutrient content (proportion) of the fertilizer sold in the particular state.5 
This definition glosses over the distinction between pounds used per acre fer- 
tilized and the per cent of all acres fertilized. I shall not investigate here whether 
or not this is a useful distinction. The unit measurement in this case is a fixed 
weight index with pounds of N, P20;, and K20 each getting equal weight. In 
time series analysis, the mix did not change enough to make a distinction be- 
tween the plant nutrient unit and a price weighted quantity aggregate useful, 
but this is not necessarily the case in cross section analysis. Not only the over- 
all nutrient content of fertilizer, but also the proportions in which nutrients are 
used, change substantially from state to state. Therefore, an alternative, price- 
weighted measure of fertilizer use in each state, Y’, was constructed using ARS 
plant nutrient use data and price-weights derived from an analysis of fertilizer 
prices. For 1954, the Agricultural Research Service has estimated the total ton- 
nage of each of the three principal plant nutrients used on farms in each state, 
the total acreage fertilized, and the percentage that this acreage represents of 
total cropland harvested and of improved pasture.’ Using the derived U. S. 
price weights and the separate plant nutrient tonnages, a price weighted ag- 
gregate of total plant nutrients used in each state was computed.’ Dividing this 
figure by total acres of cropland harvested and improved pasture (acres fer- 


§ Source of the first two figures: U. S. Census of Agriculture: 1954, Vol. II, Washington: Government Printing 
Office, pp. 308-9. The last figure, the per cent nutrient content, is from USDA, ARS, Commercial Fertilizer Con- 
sumption in the U. S., 1953-1954, Washington: Government Printing Office, 1955. It refers to the year ending June 
30, 1954, rather than to the calendar year, but this should not affect the results much, as the per cent nutrient 
content of fertilizer changed very little between 1953-54 and 1954-55. This estimate of plant nutrient pounds used 
per acre will not coincide with the recently published ARS estimates in ARS 43-49, “Fertilizer and Lime Application 
on Farms, 1954,” April 1957, Table 500, for several reasons. The Census data are based on a 20 per cent sample, 
whereas the ARS estimates are based on shipment data. For the U. 8. as a whole, the reported total consumption 
by the Census is about 15 per cent lower than the ARS estimate. The difference is largely accounted for by the 
narrower definition of “fertilizer” used by the Census and by the exclusion of fertilizer consumption for non-farm 
uses (lawns, parks, etc.). More serious is the difference in the denominators used. The Census counts in the de- 
nominator all cropland harvested, cropland pastured, and all non-woodland pasture. The ARS estimates include 
only the first. Because of this, these estimates differ by a factor of 2 to 3 for some states. A similar set of ARS estimates, 
however, is used in the second set of computations reported below. 

* USDA, ARS, Fertilizer Used on Crops and Pasture in the U. S., 1954 Estimates, Statistical Bulletin No. 216 
Washington: Government Printing Office, 1957. 

7 The weights used were: F =1.62N +.93P +.45K, where N stands for nitrogen, P for phosphoric acid (P20s), K 
for potash (K), and F is the resulting price weighted aggregate. The weights were derived from the coefficients of 
a regression of 24 U. S. average prices per ton of different mixed fertilizers on their respective percentage contents 
of N, P20s, and K:0 in 1955. For a description of this regression, see Zvi Griliches, “The Demand for Fertilizer .. . ,” 


loc. cit., pp. 599-600. 
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tilized divided by the proportion of all acres fertilized), we get Y’, a price 
weighed measure of pounds of plant nutrients used per acre of cropland har- 
vested and improved pasture.*® 

There are several differences between these two estimates of plant nutrients 
used per acre in each state, between the measure based on Census data, Y, 
and the measure based on ARS data, Y’. The ARS estimate of fertilizer use 
differs both because of slight differences in definition and because the ARS 
used other sources, mainly shipment data, besides the Census figures to arrive 
at its estimate. Also, the second measure is price weighted, whereas the first is 
not. In the first measure each of the nutrients gets equal weight. Finally, the 
denominators are very different. In the first measure we divide by all cropland 
harvested and all non-woodland pasture. In the second, we used only improved 
pasture rather than all non-woodland pasture, together with all cropland har- 
vested, in the denominator. This last difference by itself results in these esti- 
mates differing by a factor of 2 for some states. Both measures will be used in 
the analysis that follows. 

For the set of computations utilizing Census data, the price paid per plant 
nutrient unit, P;, is derived by dividing the Census figure on expenditures per 
ton by the average nutrient content (proportion) of fertilizers sold in the state. 
Again, this price concept is subject to the deficiency which arises from giving 
all three nutrients the same weight. An alternative price per nutrient unit, P,’, 
was calculated by dividing the estimated total expenditure on fertilizers by 
farmers in each state by F, the price weighted aggregate of plant nutrients used 
in the state.® The first measure of price was used together with the Census 
based estimate of plant nutrients used; the second, with the ARS based esti- 
mates. In further discussion, wherever the second definition of price or quantity 
is used, that variable is denoted by a prime. 

The price of fertilizer relative to the price of products, X;, is arrived at by 
dividing the price of fertilizer by the value of all crops produced per acre of 
cropland. The latter figure was calculated by adding together the value of 
production of field crops and the value of all vegetables, fruits, and nuts, and 
horticultural specialties sold in 1954, and dividing it by total acres of cropland 
in farms.!° This is not strictly a measure of the price of fertilizer relative to the 
price of output, but rather a measure of the price of fertilizer relative to the 
value of output per acre. It includes not only the impact of cross-sectional dif- 
ferences in product prices, but also the impact of cross-sectional differences in 
crop yields per acre, differences in the quality of land, and the level of other 
inputs. No cross-sectional index of prices received is available, and besides, the 
measure used is quite reasonable. The higher the value of output per unit of 
land, the higher would we expect to be the use of fertilizer on the same unit of 
land. 

Of all factor prices, only those of labor and land were readily available. The 


1.62N -+-.93P +.45K Per cent of all acres fertilized 
Acres fertilized 100 


* The unpublished state expenditure data on fertilizer were made available by the Farm Income Branch of the , 
Agricultural Marketing Service. 
4% Source: Census of Agriculture, 1954, Vol. II, loc. cit. 
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price of fertilizer was divided by the average cash wage paid per day in 1954 
to arrive at the price of fertilizer relative to the price of labor—X»2- X,4; the price 
of fertilizer relative to the price of land was arrived at by dividing the price of 
fertilizer by the average cash rent paid per acre in 1954." 

A word about the form of these variables. It is more usual to deflate all the 
prices by the price of the product or by an external deflator; theoretically, how- 
ever, it does not matter which of the prices is used as a deflator. It seemed 
natural, in time series analysis, to ask about the real price of fertilizer in terms 
of (1) the price of the product and (2) the prices paid for other factors. To pre- 
serve continuity, the variables used in the cross-sectional analysis were defined 
analogously. It should be noted, however, that the price paid per plant nutrient 
unit, using either P; or P,’, does not vary much between states. Therefore, 
most of the results described below are due to the “work” of the denominators 
in these variables (crop value per acre, wage, and cash rent) rather than to the 
numerator (price paid per pliant nutrient unit). 

To take into account the wide differences in the soils of the various states, 
X;, the average per cent content of nitrogen in the soil was added to the 
analysis."? Presumably, soil nitrogen is to some extent a substitute for com- 
mercial nitrogen. 

The form of the equation used is the same as the one used in time series anal- 
ysis—linear in the logarithms of the variables. It does not, however, include 
Yiu, the previous year’s consumption, as an independent variable. Y;_, was 
left out because the data available for 1953 were not directly comparable to the 
Census data and because it was felt that cross-sectional differences in fertilizer 
prices and consumption are of a somewhat more long-run nature than the year 
to year differences in national prices and aggregate consumption. This, however, 
is reconsidered below. The unit of observation is a state. Hence, we have 48 
observations. However, because the ARS data were incomplete for some states, 
Arizona, Nevada, and New Mexico were left out from the second set of com- 
putations, reducing the number of observations to 45. 


3. THE RESULTS 


The results of all these calculations are presented in Table 381. Equations 
(1) through (2a) report on calculations using the Census data, equation (3) 
uses the ARS based concepts and is limited to 45 states. For comparison 
purposes, (2) was recomputed excluding the same three states (Arizona, Ne- 
vada, and New Mexico). The results are identified as (2a). It is easily seen that 
most of the difference in fit between (2) and (3) is due to the exclusion of these 
three states in computing (3). Nevertheless, the use of a price weighted concept 
of plant nutrients and the deletion of non-improved posture from the definition 
of “acres” results in a somewhat better fit and in higher significance levels for 
most of the variables. In particular, X;, the nitrogen content of soil, whose con- 
tribution was not statistically significant in the first set of computations, is 
significant in the second. This is the result of the greater weight given to nitro- 
gen by the weighting procedure. 


U Loc. cit. 
2 Source: A. L. Mehring, op. cit. 
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TABLE 381. REGRESSION RESULTS: U. 8. DEMAND FOR FERTILIZER 


BY STATES, 1954 


De- Coefficients of R 
N pendent R? — 
48 y — .948 1.646 |—1.300 . 767 (1) 
(.337) (.487) (.222) 
48 y — .894 1.534 |—1.326 — .103 . 768 (2) 
(.389) (.629) (.242) (.360) 
45 y —.781 1.395 |—1.264 — .428 . 882 (2a) 
(.320) (.4138) (.192) (.244) 
x's 
45 y’ — .841 .992 — .607 -552 .894 (3) 
(.209) (.194) (.129) (.155) 


Notes: 
The numbers in parentheses are the calculated standard errors. Small letters denote logarithms of the variables. 


Y—Pounds of principal plant nutrients used per acre of cropland harvested and non-woodland pasture (Census 
data). 
Y’—Price weighted pounds of principal plant nutrients used per acre of cropland harvested and improved 
pasture. 
Ps;—Average expenditure per ton of fertilizer (Census) divided by the average percentage nutrient content of 
fertilizer. Price per plant nutrient unit. 
P’;—Total expenditures on fertilizers by farmers divided by a price weighted aggregate of plant nutrients used. 
Price per plant nutrient unit, with N, P, and K having different weights. 
Xi—Py divided by the value of crops produced per acre of cropland. 
X':—P’; divided by the value of crops produced per acre of cropland. 
X:2—Py divided by the average cash wage paid per day to hired farm workers. 
X':—P’; divided by the average cash wage paid per day to hired farm workers. 
X«—Py divided by the average cash rent paid per acre. 
X'«—P’; divided by the average cash rent paid per acre. 
Xs—Average per cent nitrogen content of soil. 


Between seventy-five and ninety per cent of the inter-state variation in plant 
nutrient use per acre is explained with the help of a few, rather simple, economic 
variables. For cross-sectional analyses, this is a high proportion. Also, all of 
the coefficients have the expected sign. In particular, they confirm the intuitive 
notion that labor is a complement to and land is a substitute for fertilizer. These 
coefficients imply (approximately) an elasticity of +.9 with respect to crop 
value per acre, a cross-elasticity of —.9 with respect to the price of labor, and 
a cross-elasticity of +.6 with respect to the price of land services." 


13 An elasticity measures the proportionate change in one variable associated with a given proportionate change 


in another variable 
oy X 
(cissticity 
ox Y 


The coefficients of functions linear in the logarithms of the variables are also the elasticities of the dependent variable 
with respect to changes in any one of the independent variables. Remembering that both the price of labor and 
the price of land are denominators in X2 and X, respectively, the statement in the text means that on the average 
and approximately a 10 per cent increase in the farm wage rate is associated with a 9 per cent decrease in plant 
nutrients used per acre, whereas a 10 per cent increase in the price of land (cash rent) is associated with a 6 per 
cent increase in the use of plant nutrients per acre. 
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4. DYNAMIC CONSIDERATIONS 


The previous sections treated inter-state differences in plant nutrient use as 
if they represented long run adjustments to differences in relative prices. We 
know from our time series analysis, however, that relative prices have been 
changing and that for the U. 8. as a whole, equilibrium was not achieved in 
one year. But inter-state differentials may reflect also short run considerations. 
A way of dealing with such a problem was outlined in great detail in the pre- 
vious paper. Therefore, only a brief description of it will be given here. 

The basic idea is that relative prices determine “long run” or “desired” con- 
sumption, and that adjustment to a disequilibrium is not instantaneous. It 
is assumed that the adjustment during any one period is some function of the 
difference between the “desired” and the “actual” or current level of use. In 
particular, it is assumed that the adjustment equation is linear in the logarithms 
of desired and actual consumption. That is, using small letters to denote the 
logarithms of the variables: y,*=f(p.)—desired use is a function of relative 
prices, where p stands for a vector of all the relevant prices, but y:—y:-1 
= b(y:* —y:+)—the actual change in fertilizer use is proportional to the dif- 
ference between desired use and the previous level of use. Substituting the first 
equation into the second, and solving for y:, we get y:=bf(p.) +(1—b) yi. 
Operationally, this means that we add a lagged value of the dependent variable 
into the regression, and that its coefficient provides us with an estimate of the 
adjustment coefficient b. 

Because cross-sectional differentials do not change as rapidly as annual ag- 
gregate figures, a lag longer than just one year is necessary, if we do not want to 
introduce a variable which is practically equivalent to the dependent variable. I 
chose a 5 year lag, introducing an estimate of the 1949 weighted plant nutrient 
consumption per acre as an additional independent variable into the regres- 
sion." 

Table 383 reproduces regression (3) and compares it to regression (4), which 
includes plant nutrient consumption per acre in 1949 as an independent vari- 
able. It is easily seen that the introduction of the lagged dependent variable 
reduces both the absolute size of all the other coefficients and their statistical 
significance. When these coefficients are translated back into long run elastici- 
ties by dividing them through by 1 minus the coefficient of Yio’ (.591), most 
of them are still smaller than the comparable coefficients in regression (3). 
The difference may not be statistically significant, and it is substantial only for 
X,’ and X,’, but it is puzzling. Yio’ was introduced on the assumption that the 
cross-sectional differences in 1954 did not reflect adequately long run differ- 
entials in use. On the assumption that long run elasticities are larger than short 
run, the last line in Table 383 should have come out larger, in absolute size, than 


“4 The earliest date for which comparable data were available was 1949. In particular, it was better than 1950, 
because the latter year suffered from the Korean disturbance. Figures on plant nutrient consumption by states for 
the calendar year 1949 were obtained from USDA, Statistics on Fertilizers and Liming Materials in the U. S., Statisti- 
cal Bulletin No. 191, Washington: Government Printing Office, 1957, and were adjusted for non-farm use of fer- 
tilizers on the basis of information available for 1947 and 1950 in Statistical Bulletin No. 216 (op. cit.) and ag- 
gregated using the 1955 price weights described above. Total cropland harvested and improved pasture in 1949 were 
calculated from Census data, using 1949 data on cropland harvested and pastured and other pasture and the 1954 
percentage of improved pasture to “other” pasture. 
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TABLE 383: REGRESSION RESULTS: U.S. DEMAND FOR 
FERTILIZER BY STATES, 1954. N =45 


Coefficients of Re- 
Dependent gres- 
variable x's 2" Xs i949 R b sion 
y' 1956 — .841 .992 |—.607 |—.552 .894 1.00 | (3) 
(.209) | (.194) | (.129) | (.155) 
y' 1956 — .267 285 |—.296 |—.338 409 | .949 -591) (4) 


(.171) (.174) (.101) | (.114) (.063) 


Estimates of “long 
run” coefficients — .452 482 |—.501 |—.572 
in (4)* 


Notes: 

See notes to Table 381 for the definition of variables and the text for their sources. Small letter stand for logar- 
ithms of the variables. The figures in parentheses are the calculated standard errors of the coefficients. 

* Line (4) divided by 0.591. 


the first one. It did not. This may imply that an “adjustment model” does not 
represent a very useful approach to cross-sectional data. However, more evi- 
dence is needed on this point. Perhaps a more useful approach would consist 
of doing several cross sections in time, and using covariance analysis to estimate 
the coefficients after allowing each state to have a constant term of its own. 


5. COMPARISON WITH TIME SERIES RESULTS 


No direct comparisons with time series results are possible, because none of 
the variables are defined similarly in both cases. Nevertheless, some qualita- 
tive comparisons are possible. The estimated cross-sectional elasticity of plant 
nutrient consumption to changes in the price of fertilizer relative to the price 
of product (—.9) is substantially lower than the time series estimate of the long 
run elasticity (—2.0), but it is quite a bit higher than the short run time series 
estimate (—.5). Remembering that the cross-sectional analysis includes, 
whereas the time series analysis does not, prices of other inputs as additional 
variables, the difference between the two estimates is in the right direction. 
Because input prices and product prices are correlated, the omission of input 
prices in time series analysis would tend to result in an overestimate of the 
“true” coefficient of 2. 

The estimated adjustment coefficient in the time series analysis was about 
.2, whereas in this cross-sectional study it is about .6. But there is no contra- 
diction here. The first figure refers to the fraction of the adjustment accom- 
plished during one year, whereas the second refers to the fraction of dis- 

equilibrium eliminated in 5 years. These should be different. Actually, if we 


% The major time series analysis results are summarized in the regression (1) log Y= +++ —.53 log Xig 
+.76 log Ytu, R?=.98, where Y is total U. 8. plant nutrient consumption per acre of cropland used for crops, X1 
is the price paid per plant nutrient unit divided by the index of prices received by farmers for all crops, and the 
period is 1911-1956. The implicit adjustment coefficient is .24, and the estimate of the long run price elasticity 
(—.53/.24) is approximately —2. For a more detailed description, see Zvi Griliches, “The Demand for Fertilizer 
; 
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were to assume an adjustment coefficient of .2 for the one year interval, this 
would imply that approximately .67 of the disequilibrium would be eliminated 
in 5 years, which is very close to the cross-sectional estimate'® of .6. Hence, the 
time series and the cross-sectional studies agree on this point. 

One of the still obscure areas of econometrics is the relationship between 
cross-sectional and time series estimates of similar functions. Even if there 
were no underlying differences in micro parameters, the two estimates may not 
coincide if the time series and cross-sectional differences in the independent. 
variables are not of the same character. For example, the year to year changes 
in prices may be of a rather short run nature, and the response to them may be 
much smaller than it would be to similar but persistent cross-sectional differ- 
ences. That is, the two bodies of data may be afflicted to a different degree by 
“transitory” variations in the variables.'” 

If the micro parameters are not the same, the estimates may also differ 
because of a difference in “aggregation bias.”'* However, as the two analyses 
contain a different number of variables and, as the variables are defined and 
measured differently in each case, one could not expect an exact correspondence 
between the two estimates of the demand elasticities, even if all the other dif- 
ficulties did not exist. At this stage, all that one can reasonably expect from 
these different bodies of data is consistency in the signs and orders of magnitude 
of the coefficients, and they do live up to this expectation. 


% Knowing what the one year adjustment coefficient is, one can calculate the implied 5 year adjustment co- 
efficient by the following formula: bs =5b —10b?+ 103 — 5b‘ —b', where b is the “one year” adjustment coefficient. 
The proportion of the total disequilibrium eliminated after N periods is given by by =) Xo 6(1 —b)*. The formula 
for bs is a simplification of this more general formula. 

17 See M. Friedman, A Theory of the C. ption Function, Princeton: Princeton University Press, 1957. 


8 Cf. H, Theis, Linear Aggregation of Economic Relations, Amsterdam: 1954, North Holland Publishing Co. 


MAPS BASED ON PROBABILITIES 


MieczysLaw CHOYNOWSKI 
Polish Academy of Sciences, Warsaw* 


A map which shows, by hatching or shading, the geographical distri- 
bution of some phenomenon in terms of absolute frequencies or per- 
centages may usefully be supplemented by a map showing the probabili- 
ties of the observed deviations from the mean, calculated on the 
assumption that the true geographic distribution is uniform. Such a 
supplementary map helps guard against attaching geographic sig- 
nificance to random variations. 


HILE studying the distribution of brain tumors in a part of Poland [1] 

I had occasion to employ a type of statistical map which, so far as I 
know, has not previously been used and which seems potentially useful in a 
variety of applications. The method is so simple that there is no need to present 
it in detail. Everybody who finds it likely to be useful in his work will be able 
to use it and to improve upon it, having read this suggestion. 

Statistical maps as presently used [2] show, mostly by hatching or shading, 
the distribution of some phenomenon over a geographical area in terms of 
absolute frequencies or percentages. While this method is useful as a simple 
description of the data, it may have shortcomings when one wants to make 
inferences based on the spatial distribution of the phenomenon, as was the case 
in my study of brain tumors. 


TABLE 385 


NUMBER OF BRAIN TUMORS AND RESULTING PROBABILITIES IN 17 
COUNTIES OF THE RZESZOW DISTRICT CHOSEN FOR ILLUSTRATION 


Population iT Number of Tumors 
County in No. of Tumors Probability 


Thousands for 100,000 Expected | Observed 


— 


0.0264 
0.3292 
0.0314 
0.4318 
0.4025 
0.1844 
0.1072 
0.2199 
0.3881 
0.3324 
0.5211 
0.3277 
0.3057 
0.4330 
0.1560 
0.3293 
0.5227 


Brzozow 
Debica 

Gorlice 
Jaroslaw 
Jaslo 
Kolbuszowa 
Krosno 
Lesko 
Lubaczow 
Lancut 
Mielec 
Nisko 
Przemysl 
Przeworsk 
Rzeszow 
Sanok 
Tarnobrzeg 


NNO 
15 


-478 


* This paper was written while the author was a Visiting Ford Foundation Fellow in the United States. 
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| | 
70 

110 
3 | 
102 
101 222 
60 
101 
17 I .879 
40 .068 
91 .705 
73 774 

67 464 
88 550 
57 947 
182 409 
56 
67 464 


Fia. 386. Rates of brain tumors per 100,000 in 17 countries of the Rzeszow district. 


Having constructed a map which showed the rates of brain tumors in sixty 
counties (poviat) of Southern Poland (a part of this map is shown in Figure 386, 
I found that some of the rates were very high or very low in comparison with 
the average for the whole area, this average being 5.17 per 100,000 inhabitants 
(see the third column of Table 385 containing data for 17 counties). As these 
marked geographic irregularities were very surprising, I studied the data fur- 
ther and noticed that the counties deviating without any clear explanation 
(such as quality of medical care, differences in age composition, etc.) had small 
populations. Consequently, even a small difference in absolute frequencies 
created a substantial difference in rates. To put it differently, the deviations 
might well be attributable to sampling variations. 
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It then occurred to me to construct a map showing not the incidence of tu- 
mors, but the probability of these incidences if in fact the true incidence were 
the same for the whole area. 

Since brain tumors are comparatively rare (12 per 100,000 persons in a period 
of seven years in the county having the largest incidence), the Poisson dis- 
tribution should be satisfactory for calculating the probability of any given 
number of tumors. Table 385 shows the data and the resulting probabilities for 
17 counties. These probabilities are either upper- or lower-tail probabilities, 
depending upon whether the observed rate is above or below the mean rate. 
Table 385 shows that though one county with the low and the one with the 
high rate have probabilities less than 0.05 (and in the whole area there were 


average 


average 


444, 


4 
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44 
47 


7044 

4th 


444 


AAS SAAN 

SAN 


Fia. 387. Probabilities of observed numbers of tumors in 17 countries. 
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some with even less than 0.01), some of the rates that are most striking and 
difficult to explain have comparatively high probabilities. Gorlice, which has 
excellent medical care, thanks to which tumors are well diagnosed, had a very 
high rate which significantly exceeded the average for all 60 counties. Lesko 
had a very high rate which appeared not to deviate significantly from the 
average. Brzozow, which isa rather backward county with less adequate medical 
care, had a very low rate which was significantly less than the average, while 
Kolbuszowa and Krosno had very low rates which did not produce unusually 
low probabilities. 

A map with hatching corresponding to the probabilities of Table 385 is 
shown in Figure 387. Such a map makes it possible to study the spatial dis- 
tribution of a phenomenon without danger of basing one’s conclusions on non- 
significant random variations. This method may be used for any phenomena 
distributed geographically, for example, in medicine, zoology, ecology, botany, 
sociology, economics, ete. 
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PARAMETER ESTIMATES AND AUTONOMOUS GROWTH* 


W. A. NEISWANGER AND T. A. YANCEY 
University of Illinois 


The influence of certain specification errors on estimates of param- 
eters in economic models is examined using Monte Carlo techniques. 
Autonomous growth is a secular change in the endogenous variables not 
explained by the exogenous variables and parameters of the structural 

_ equations. Autonomous growth will therefore appear in the shocks, and 
this usually causes them to become correlated with the exogenous 
variables in economic models—a specification error. 

The influence of linear autonomous growth on least squares (LS) and 
limited information single equation (LISE) estimates is examined using 
simulated economic data. Estimates by both methods are badly biased 
when autonomous growth is present but ignored, and the use of prob- 
ability theory tends to give very bad decisions. A simple change in the 
model removes the difficulty for the LISE estimates. Some procedures 
to improve estimates are suggested when linear autonomous growth is 
thought to be present. 


1, INTRODUCTION 


STIMATES of the magnitude of parameters in economic models are made by 

the use of methods which have well defined specifications. When economic 
time series data are used in making the estimates in the context of the small 
models which are customarily used because of limited data, the conditions 
actually met usually do not conform to the specifications.! 

Our purpose is to make estimates of known parameters under some of these 
commonly met conditions which violate specifications of the estimation 
methods employed. These experimental estimates will be made under controls 
which reveal the influence of such specification errors. Two methods of estima- 
tion, least squares (referred to hereafter as LS) and limited information single 
equation methods (referred to as LISE) are tested in two models. 

It is already known that LS estimates are biased when the disturbances and 
and the explanatory variables are correlated? and the upper limits of such 


* The research reported here was made possible by grants from the Research Board of the Graduate College 
of: the University of Illinois and the National Science Foundation. Vincent I. West, Department of Agricultural 
Economics, University of Illinois, participated in the early planning phases of this project. Kern W. Dickman of the 
University Social Sci Cc Cc ittee constructed required programs for the digital computer, the “Illiac.” 
Robert Resek, Everett Schleter and Vorris Blankenship, while undergraduate students in economics, adapted many 
existing programs to our use, originated others, organized the data for processing, presided over the computing and 
the resulting files of tapes and print-out sheets. 

1A model, based on hypotheses about economic behavior, specifies the form of the structural equations, the 
restrictions on the unknown parameters in the equations, and some joint density function which generates the un- 
observed shocks. The method of estimation requires that certain conditions be met in the model to get estimates 
with desirable properties. A specification error occurs when some of the specifications of the estimating method as 
reflected in the model are not met. A failure to meet the requirement that the covariance between the disturbance 
and the explanatory variables be zero or the presence of errors of observation in all data illustrate specification 
errors. An incorrect economic theory is not here called a specification error. 

2 J. Bronfenbrenner, “Sources and size of least-squares bias in a two equation model,” T. C. Koopmans and 
W. C. Hood, ed., Studies in Econometric Method, New York: John Wiley and Sons, Inc., 1953, pp. 37-8. L. Hurwicz, 
“Least squares bias in time series,” T. C. Koopmans, ed., Statistical Inference in Dynamic Economic Models, New 
York: John Wiley and Sons, Inc., 1950, pp. 365-83. H. Wold and L. Jureen, Demand Analysis, New York: John 
Wiley and Sons, Inc. 1953, pp. 37-8. 


389 


390 AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 1959 


bias can be set under this condition.* The effects of the specification errors we 
examine are unknown when LISE methods of estimation are used.‘ More 
specifically the effects on estimates when the variables designated as exogenous 
in the equations are endogenous in some larger system is unknown. Nor do 
we know how the results of the two methods of estimation compare when ap- 
preciable correlation exists between the disturbances and explanatory variables 
and among the explanatory variables. 

Some of the unsolved problems of estimation can be examined fruitfully by 
Monte Carlo techniques. Ladd has studied the influence of observational errors 
on estimates of parameters by mathematical techniques which specify such 
errors do not exist.’ Wagner has constructed distributions of estimates of struc- 
tural parameters by three different methods to examine the effect of small 
samples on the distribution of estimates.* Orcutt and Cochrane have examined 
the pervasiveness of autocorrelation in economic time series, and its effect on 
the estimates of the parameters, their standard errors and on the bias in the 
residuals.’ 

In our use of Monte Carlo methods we have constructed simulated economic 
data and models. The objective is to show how disparate rates of growth in 
economic time series may affect the estimates if those disparate rates of growth 
in endogenous variables are in part due to what we call autonomous growth. 
Autonomous growth is a secular change in the endogenous variables not ex- 
plained by the exogenous variables and parameters of the structural equations. 

Autonomous growth as here defined will cause the shocks to become corre- 
lated with other variables. How these correlations affect estimates when all 
variables are correlated and how, in this experiment, the introduction of an- 
other variable in the equations influences the results will also be examined. 
Finally, we also show what happens to estimates under our experimental con- 
ditions when the data are not complicated by disparate rates of growth, but 
the z’s, with given means, variances and covariances are allowed to differ, 
randomly, from sample to sample. 


2. THE MODELS 


The models were chosen for their simplicity and their representativeness of a 
class of models constructed by economists. The model first used in this experi- 


+H. Wold and P. Faxer, “On specification error in regression analysis,” Annals of Mathematical Statistics, 
(Mar. 1957) pp. 265-9. 

4 For a special case see H. Chernoff, “The computation of maximum likelihood estimates of parameters of a 
system of linear stochastic difference equations with serially correlated disturbances,” Annals of Mathematical 
Statistics (Mar. 1949), p. 137. 

5 George W. Ladd, “Effects of shocks and errors in estimation, an empirical comparison,” Journal of Farm 
Economics (May 1956), pp. 485-95. 

* H. M. Wagner, “A Monte Carlo study of estimates of simultaneous linear structural equations,” Econometrica, 
(Jan. 1958), pp. 117-33. 

' G. H. Orcutt, “A study of the autoregressive nature of time series used for Tinbergen’s model of the economic 
system of the United States, 1919-1932,” Journal of the Royal Statistical Society, Vol. 10 (1948). 

G. H. Orcutt, and D. Cochrane, “A sampling study of the merits of autoregressive and reduced form trans- 
formations in regression analysis,” Journal of the American Statistical Association (Sept. 1949). 

Cochrane, D., and G. H. Orcutt, “Application of least squares regression to relationships containing autocorre- 
lated error terms,” Journal of the American Statistical A iation (Mar. 1949). 

* Experimental work now in process but not yet ready for publication is designed to show the influence, if any, 
of multicolinearity on the bias due to the correlation of variables with the shocks and to further explore, using 
data generated by use of difference equations and employing lagged endogenous variables as well as exogenous 
variables, the effects of correlations bet shocks and other variables. 
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ment is the set of over-identified structural equations® originally constructed 
and used by Ladd.?° 
The following model will be referred to as Model I 
= Broyo(t) yuzi(t) + + + w(t) 
ya(t) = Beoya(t) + vesta(t) + vosea(t) + Y20 + ue(t) 
The second model is the same as the first except for the introduction of a 25 
term for time. This will be referred to as Model II. Thus: 
yr(t) = Broyo(t) + + yieee(t) + rnises(t) + 110 + 
yr(t) = Booyo(t) + + + + + welt) 
In both of these, the y’s are endogenous variables, and the z’s are exogenous. 
The m and ue are shocks, i.e. unexplained variation." Those parameters which 


are common to both models have the same value in each model. These values 
are: 


| Bis Boe ‘12 | | 710 


Models I & II | —.40 -60 -10 -45 | 25 | .80 100 50 


Over-identified equations are used because they are characteristic of eco- 
nomic models. The equations in each model may be thought of as supply and 
demand functions on partial equilibrium analysis. In such a case y; could rep- 
resent the quantity sold and ye could be a price term. In this context we have 
given the parameter of y2 a minus sign in the first equation and a plus sign in 
the second, suggesting a positive relationship between quantity and price in 
supply equations and a negative relationship in demand equations. However, 
all data used are constructed so that the parameters themselves may be known, 
and no complete analogy to supply and demand functions exists in the model or 
is intended. 


3. THE DATA 


There are 3000 independent observations in each z and y. These have been 
grouped into 120 sets of 25 observations each to give 120 samples with n=25. 
Each of the samples may be thought of as a time series spanning 25 years and 
in this context, we have random samples of 120 observations on each of the 25 
yearly values of z and y. The independent observations on 2’s and w’s (later 


® The equations are identified, since the rank condition for identification is satisfied for each equation in each 
model. For both models, the rank of the restrictions matrix is G =1, i.e., 


0 0 yn yu 
p =1 and p = 1 for equations 1 and 2, 


respectively. The order condition is G4 —1 =1<K** =2 for each of the equations. G, refers to the number of endog- 
enous variables in a particular equation, and K** refers to exogenous variables in the system not in a particular 
equation being estimated. The order condition indicates each of the equations is over-identified. Autocorrelation in 
the variables may destroy identification, but that does not happen in this case. 

10 Op. cit. 

11 Unexplained variation is usually defined as the aggregate effect of exogenous variables omitted from the 
equations. 
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modified to include growth components) were generated from Wold’s Tables 
of Random Normal Deviates.” 

The stochastic properties of the z’s are not the result of observational errors 
but are assumed to be characteristic of the “true” behavior of series which are 
endogenous variables in some larger system. The rationale oi this concept of the 
2’s, which we accept for these experimental purposes, will be found in Haavel- 
mo’s work." Briefly it is that any single observation we have of, say, an annual 
total is but one value for the annual total which the complex of relations exist- 
ing during that period might have produced. Because of the multiplicity of 
minor causes which exist in such a structure the many values which such an 
annual total might reveal, if they could be observed, are assumed to have 
stochastic properties. 

That is, of course, another way of saying that in some larger true structure of 
an economic system almost all unlagged variables are interdependent and 
mutually determined. An economic variable is therefore exogenous only with 
respect to some sub-structure of the system as a whole. Obviously the designa- 
tion of an economic variable as exogenous in such a small system does not 
change its statistical properties. It is in this context that our z’s are from some 
joint density function and are permitted to differ randomly from sample to 
sample." 

The values assigned to the means, variances and covariances of the z’s and 
u’s are:® 


Means 478 381 478 43 


12 Department of Statistics, University of London, Cambridge: Cambridge University Press, 1954. Tracts for 
Computers, No. XXV. 

8 Trygve Haavelmo, “The probability approach in tries,” Ee etrica, Vol. 12 Supplement (Jul. 
1944), pp. 49-51, 69, 70. 

4 For a discussion involving conditional probability density functions where the z’s do not change from sample 
to sample see T. C. Koopmans and W. C. Hood, “Estimation of simultaneous economic relationships,” Studies in 
Econometric Method, New York: John Wiley & Sons, Inc., 1953, p. 120. 

% The z's and u's with these properties were generated for each sample of 25 by a transformation from a matrix 
of the random variables. 


aay eee a(t | 0 ) | 

aa(1) + 2a(t) 0 
za(t) 0 
ui(1) + u(t) Ps 0 
ua(1) ua(t) Pu Po) \s0(1)* se(t)) 


0 


The P matrix is a triangular matrix such that PP’ =M where M is the population variance-covariance matrix of 
the z's and u's. The s;(t) are random normal deviates with mean 0 and variance 1 from Wold’s table (op. cit.) and 
the mj; are the means of the 2's. The elements Pu + + + Psi and Ps + + + Pes are zero due to the specification of in- 
dependence of 2's and u's. Tracts for Computers, op. cit. pp. xi-xiii. 


2540 1434 | 503 .2 0 
Z2 3106 1622 341.1 0 
Zs 3248 873.3 0 
285.8 0 | 
100 
U2 
0 
mi 
ms 
ms 
+ | m 
0 
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These means, variances and covariances of the z’s were set after an examina- 
tion of these magnitudes in several of the series used by Klein in his model III." 
The variances and covariances of the y’s are determined by the structural 
parameters, the and the w’s. 

The following correlation matrix for the z’s and u’s may be obtained from 
the variance-covariance matrix: 


22 


1.0000 
1.0000 -0000 ‘ 
.0000 -0000 
-0000 -0000 
1.0000 2800 
1.0000 


The population correlations among the z’s range from .36 between zz and 24 
to .91 between z; and zs. There is no correlation between the z’s and w’s. 

The parameters from the structural equations were then substituted into 
the reduced forms: 


1 
y(t) = ——— — + (—Bovvu)21 + (—Bo2v12)22 
Bir Boo 


+ + — Boots + 


1 ‘ 
yo(t) = ——— [(v20 — vio) — — Yi222 + + — + Ue] 
Biz — Bo 


giving 
= 80 + .062,(t) + .2720(t) + .1023(t) + .32z4(t) + + .4u2 
yo(t) = 50 + .102,(t) + .4522(t) — .2523(t) — .8024(t) + us — ue 


The 25 values of each z, and ~; for each sample were then used to obtain 25 
values of y:(¢) and y2(t). This process was repeated for each of the 120 samples. 

These data are static. There are no autocorrelations in any series except those 
introduced by the chance selection of random numbers for the samples. They 
were used in constructing a second set of data in which all series are autocor- 
related and to provide estimates of parameters which could be compared with 
similar estimates from the autocorrelated series. 

The second set of data was generated by adding a growth increment per 
unit of time 6;, to the 2’s and w’s in the form: z,(t) +6 ;(t-13) and u,(t) +6 ,(t-13). 
The magnitudes of the trends used were determined after an examination of the 
series used in Klein’s model III.!7 They are: 


| 22 23 


4.00 | 10.00 4.00 | 0.50 


1 Lawrence R. Klein, E ic Fluctuations in the United States, New York: John Wiley & Sons, Inc., 1950. 
li Op. cit. 


U2 
— 
22 _| 
23 
3.40 
4.40 
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Following these adjustments for growth, a second set of y’s was generated for 
each of the 120 samples using 2’s and u’s containing the above trends. 

The introduction of growth elements in the data is a further step toward 
realism in simulated economic data and, in this case, caused correlation to 
appear between the z’s and u’s—a specification error. The use of linear growth 
functions—often characteristic of economic time series in spans of 20 to 25 
years—is an important feature of the experimental design. The linear approxi- 
mation to reality permits the experimenter to interpret his results accurately 
through fairly simple adjustments and changes in model. This is important in 
the first stages of a sequence of experiments where the aim is to isolate and 
measure the influence of one specification error at a time. As the work progresses 
the influence of several specification errors can be observed at once. In the 
present simpler case, the introduction of these linear growth patterns increased 
the correlation among all variables excepting z; and 2. 

We have defined autonomous growth as a secular change in the endogenous 
variables not explained by the exogenous variables and parameters of the struc- 
tural equations. Using this second set of data, the secular growth in y; is 
+7.50 and in ye is +4.50 per period of time as generated in the reduced forms. 
Of this, +3.50 and +3.50 respectively, is explained by the z’s in Model 1. 
The autonomous growth is +4.00 in y: and +1.00 in y.'8 

The trends in the u’s, due to exogenous variables omitted from the equations, 
are the source of the autonomous growth. These unknown variables should, 
of course, be explicitly included in the model rather than be permitted to oper- 
ate through the shocks. Such a change in the model would be expected to reduce 
the correlation between z’s and u’s to about zero on the average as required by 
the method of estimation used. Some change is required in Model I which will, 
in effect, remove the trends from the shocks and so reduce the correlation be- 
tween the z’s and wu’s. In the absence of such changes, estimates will be subject 
to the autonomous-growth-error in estimation. 

In our case, the introduction of 25, representing time, in Model II reduces the 
correlation between the shocks and the z’s to about zero on the average as re- 
quired by the methods of estimation used. 

Table 395a shows the correlations among variables after autonomous growth 
was included in all variables and indicates how the correlations in all but one 
case were :.creased. 

The correlation between y2 and uw was .2575 and between y2 and uz was 
—.2112 before the autonomous trends were introduced into the 2’s and w’s. 
The increase in correlation between y: and the shocks, due to the autonomous 
trends, will make the LS estimates appreciably worse if no corrective action is 
taken in obtaining the estimates. 


4. ESTIMATES OF THE PARAMETERS 


Each of the two groups of 120 samples (with and without growth elements) 
was used to obtain 120 estimates of each of the parameters and of their stand- 


#8 Autonomous growth may be obtained by adding the values of trend per period for w(t), 4.40(t—13), and 
wa(t), 3.40(¢—13), to w and us in the last two terms of the reduced forms containing the numerical values of the 
coefficients. 
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TABLE 395a. POPULATION CORRELATION MATRIX WHEN ALL VARIABLES 
INCLUDE TRENDS FOR TWENTY-FIVE PERIODS 


Y2 


1.000 — .1660 
-6149 
-3816 
-8851 

1.000 


ard errors. Several combinations of data, methods and models were used in 
making these estimates as will be seen in Tables 395b through 397. 

In each table, @ is used to represent a parameter; 6 refers to a sample estimate 
of 6; 8 represents the mean of a sampling distribution of estimates, 6; 7 is the 
standard deviation of the sampling distribution of 6 and 6» is the standard error 
for some estimate, 6. The line numbered 1 in the Tables 395b, 396 and 397 shows 
the mean of the distribution of the estimates of a parameter, 6; line 2, the stand- 
ard deviation of such distributions; line 3, gives the mean of the distribution 
of standard errors; line 4, the root-mean-square deviation of the estimates 
from a parameter; line 5, the number of estimates in 120 which are greater than 
26; line 6, the number of estimates which are more than 26, from the parameter 


and line 7, the number of estimates which are more than 2é from the mean of 
that distribution. 

The results in Table 395b were obtained using Model I and data characterized 
by growth in all variables including autonomous growth in y; and ye. The es- 


TABLE 395b. MEAN VALUES OF COEFFICIENTS WITH MEASURES OF 
DISPERSION FOR 126 SAMPLES OF 25 OBSERVATIONS EACH BY 
LISE AND LS METHODS DATA WITH TRENDS IN 
28 AND w’S. MODEL I 


Parameters 


Least Squares Estimates 
. Means of Estimates, @ 
. Standard Deviation of Estimates, o- 
. Mean of Standard Errors 6 
. Root Mean Square Error 
. § >2 Standard Errors 

>2 Standard Errors 
>2 Standard Errors 


Limited Information Estimates 

1. Means of Estimates, 6 

2. Standard Deviation of Estimates, o- 
. Mean of Standard Errors ° 
. Root Mean Square Error 
. 6 >2 Standard Errors 
. >2 Standard Errors 
. |@—6| >2 Standard Errors 


.8976 .8796 
Y2 .7615 .6508 
.4738 .4666 
Ze 7546 7433 
Zs -4307 .4242 
.1989 .1960 
1.000 .9248 
Us 1.000 
Biz B22 yu yi2 yu 
—.40 -60 .10 -25 .80 
— .0162 -9520 . 1685 . 5581 .4372 -6475 
.1765 -1145 .0994 .0939 . 2061 7439 
.1928 . 1167 .1153 .1139 .1700 -6323 
.4224 .3703 .1207 -1433 . 2784 7593 
6 120 33 120 40 27 
62 101 7 13 32 10 
4 a 1 3 10 10 
— .3063 .9943 .0997 -6986 . 4003 .7791 
.2444 . 1346 . 1080 .1184 .2141 -7751 
.2451 .1481 .1279 .1377 . 1768 -6843 
.2618 .4166 . 1080 . 2753 2616 -7752 
20 120 16 120 74 33 
14 101 3 47 21 8 
10 7 3 6 10 9 
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TABLE 396. MEAN VALUES OF COEFFICIENTS WITH MEASURES OF 
DISPERSION FOR 120 SAMPLES OF 25 OBSERVATIONS BY LISE 
AND LS METHODS DATA WITH TRENDS IN 2’S AND w’S. 
MODEL II 


Br 


Parameters .60 


Least Squares Estimates 
1. Mean of Estimates, 6 
2. Standard Deviation of 
Estimates, 09 
3. Mean of standard errors 
4. Root Mean Square Error 
5. @ >2 Standard Errors 
6. 2 3 >2 Standard Errors 
7. |@—6| >2 Standard Errors 


Limited Information Estimates 
1. Mean of Estimates, 6 
2. Standard Deviation of 
Estimates, 
3. Mean of standard errors 
4. Root Mean Square Error i 
5. @ >2 Standard Errors 120 
6. |? zm >2 Standard Errors 8 
7. |@—6| >2 Standard Errors 8 


timates of the parameters from each of the 120 samples gave values which 
differed from the parameters by as much as 100 per cent. In the case of Bx, 
84 per cent of the estimates by both LS and LISE methods were more than two 
standard errors from the parameter (Line 6—Table 395b). If one had computed 
95 per cent confidence limits for 6:2 and B22 based on the ‘’ distribution using 
the 120 sample estimates and their standard errors, the parameters would have 
fallen outside the limits in over 60 per cent of the cases. With the exception of 
LS estimate 712, these estimates which one would accept as significantly differ- 
ent from zero using a five per cent level test tend to be those which are farthest 
from the parameter, while some of the best estimates, yu and yx by LISE, had 
standard errors which were on the average as large as the estimates themselves. 
The root mean square error (line 4) is considerably larger than the standard 
error in most cases due to the bias in the estimates. 

Table 396 is a summary of results when 25, time, is added to Model I creating 
Model II. There is an appreciable improvement in all the bad estimates by both 
methods of estimation, and the standard errors are reduced by about 50 per 
cent. This corresponds to results by Orcutt and Cochrane on the efficiency of 
LS estimates from autocorrelated data.'® 

The estimates by LISE are on the average very good, and except for yu, 
about 88 per cent of the estimates are at least twice their standard errors, and 
including yu, 80 per cent of estimates are greater than two standard errors. 
In addition, 94 per cent of the estimates are within two standard errors of the 
parameters. The standard deviation of the estimates is slightly greater than 


” Orcutt, G. H. and Cochrane, D., op. cit. 


.10 -25 .80 4.4 3.4 
— .2483 .5021 . 1347 .3918 .2703 -6632 |4.217 |3.842 
.1179 -0824 .0562 0668 .0885 .3144 | .4618 | .5002 
.0959 .0782 -0551 .0575 .0849 .3078 | .4970 | .4602 
-1922 .1280 -0665 .0886 .0908 .3438 | .5078 | .6672 
86 120 120 102 69 
44 32 ll 22 7 18 
8 4 9 10 8 10 
.4559 2442 -8188 [4.480 (3.457 
.0770 .0924 .3387 | .5045 | .5437 
.0675 .0903 .3543 | .7799 | .7185 
.0772 .0926 .3392 | .5107 | .5466 
91 83 
5 5 
5 5 
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the average of the standard errors in five out of six cases reversing their relation- 
ship in Table 395b. 

The LS estimates are improved but are still biased due to the correlation 
between y2 and the shocks, wu: and ue. This correlation is not due to autonomous 
growth as in Model I but is traceable, in this experiment, to the influence of 
u; and uz in the reduced forms used to generate ye.2° The standard errors of the 
LS estimates have been reduced by about 50 per cent and, as in the LISE case 
they are smaller on the average than the standard deviation of the estimates. 
Only 81 per cent of the LS estimates are within two standard errors of the 


TABLE 397. MEAN VALUES OF COEFFICIENTS WITH MEASURES OF 
DISPERSION FOR 120 SAMPLES OF 25 OBSERVATIONS EACH BY 
LISE AND LS METHODS DATA WITH NO TRENDS IN 2’S 
AND w’S. MODEL I 


Parameters @ 


Least Squares Estimates 
. Mean of Estimates, 6 
. Standard Deviation of Estimates, o% 
. Mean of Standard Errors 
. Root Mean Square Errors 
. @ >2 Standard Errors 
. |@—e| >2 Standard Errors 


~ 


. |@—@>2 Standard Errors 


Limited Information Estimates 
. Mean of Estimates, @ 
. Standard Deviation of Estimates, % 
. Mean of Standard Errors 
. Root Mean Square Errors 
. @ >2 Standard Errors 
>2 Standard Errors 
@ —6| >2 Standard Errors 


* Only 118 samples used. 


parameters, and 81 per cent of these estimates are as much as twice their stand- 
ard errors. The smaller standard errors obtained by LS methods indicate 
greater reliability for a group of estimates which are clearly less satisfactory 
than those obtained by LISE methods. 

Table 397 summarizes the use of Model I on data which meet all the require- 
ments of the LISE method except that the exogenous variables are varied 
stochastically from sample to sample. The results are very similar, for both the 
LS and LISE methods, to the results in Table 396. For the averages of the 
estimates by LISE, Tables 396 and 397 differ only in the third decimal place. 

An examination of all three tables indicates that as one would expect, the 
LS standard errors are always smaller on the average than the comparable 
LISE standard errors. However, even when the growth elements are removed 
from the data (Table 397), the bias in LS estimates seen in Table 396 persist. 
LS methods may do very badly on data and models very similar to those used in 


2 Bronfenbrenner, op. cit. 


Biz Bee yu yu 
—.40 .60 .10 -25 .80 
— .2556 -4985 .1335 .3928 .2738 -6512 i 
1049 .0804 .0568 .0834 - 2932 
.0930 .0761 -0550 .0560 -0826 . 2988 
.1783 1295 -0695 .0823 -0864 .3288 
90 120 81 120 107 72 
46 33 12 24 8 10 
7 9 8 9 10 10 2 
— .4264 . 5920 -0993 -4564 2468 . 8091 
-1281 .0918 .0671 -0747 .0870 .3201 
-1198 .0914* .0609 -0656 -0881* .3245* 
1308 -0922 -0677 .0750 .0870 -3202 
115 i18* 52 120 92 88* 
7 9* 5 5 5 3* 
10 9* 5 7 5 3* 
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TABLE 398. MEAN VALUES OF COEFFICIENTS WITH MEASURES OF 
DISPERSION FOR 120 SAMPLES OF 25 OBSERVATIONS EACH BY 
LISE AND LS METHODS DATA WITH TRENDS IN 2’S BUT 
NOT IN w’S FOR BOTH MODEL I AND MODEL II 


Bis Bx yu yn yu 
Parameters @ —.40 .60 -10 -25 .80 0 0 
Model I 
Least Squares Estimates 
1. Mean of Estimates —.2527 | .5400 | .1335 | .3911 | .2937 | .6396 
2. Mean of Standard Errors .0932 | .0630 | .0541 | .0494 | .0800 | .3040 
Limited Information Estimates 
3. Mean of Estimates — .4237 | .6396 | .0910 | .4609 | .2254 | .9022 
4. Mean of Standard Errors .1192 | .0710 | .0598 | .0610 | .0859 | .3342 
Model II 
Least Squares Estimates 
5. Mean of Estimates —.2619 | .5021 | .1315 | .3918 | .2970 | .6632 | —.0545 | .0058 
6. Mean of Standard Errors -1016 | .0781 | .0586 | .0610 | .0851 | .3078 .5093 | .4100 
Limited Information Estimates 
7. Mean of Estimates — .4224 | .5946 | .0932 | .4558 | .2442 | .8204 .0588 | .0707 
8. Mean of Standard Errors .1232 | .0936 | .0674 | .0623 | .0903 | .3541 -7197 | .6171 


much empirical research due to the correlation between the shocks and the 
endogenous variables which are used as independent variables. 

The correlations among the exogenous variables are, with the exception 
of r,,4, from 2 to 30 per cent higher in the data used in Table 396 than that 
used in Table 397 due to the linear trends in the exogenous variables (see 
Table 395). Wold and Haavelmo have referred to multicolinearity of this type 
and have pointed out it can increase the standard errors of the estimates.”! 
The comparable standard errors are about the same in Tables 396 and 397, 
however indicating that an increase in multilinearity within the correlation 
ranges found in this experiment, had very little affect on the size of the standard 
errors. This is also borne out by Model I in Table 398 which uses data with 
linear trends in the exogenous variables but with no autonomous growth in the 
endogenous variables. The average of the standard errors in these data dif- 
fer but little from those in Table 397. Table 398 also illustrates that disparate 
rates of growth in variables other than autonomous growth makes small dif- 
ferences in the estimates or standard errors on the average. Model II shows 
examples for which z, standard errors are up to twelve times the estimates on 
the average when time is included but is not needed in the Model. These results 
can also be seen in Table 399. 

Table 399 summarizes the use of Model II on data with no growth in any of 
the variables. A comparison of Table 397 and 399 shows that the addition of 
25, time, when it is not needed has only a small effect on the averages of the 
estimates or their standard errors by both LS and LISE. The z; term under 
these circumstances has no consistent tendency to make the estimates better or 
worse. The standard errors are always slightly larger with the inclusion of time 
in the model when there are no growth elements in the data. However, one in- 


% Trygve Haavelmo, “Remarks on Frisch's confluence analysis and its use in econometrics,” T. C. Koopmans, 
ed., Statistical Inference in Dynamic Economic Models, New York: John Wiley and Sons, Inc., 1950, pp. 259, 260, 
H. Wold and L. Jureen, op. cit., pp. 46, 47. 
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TABLE 399. MEAN VALUES OF COEFFICIENTS WITH MEASURES OF 
DISPERSION FOR 120 SAMPLES OF 25 OBSERVATIONS EACH BY 
LISE AND LS METHODS DATA WITHOUT TRENDS IN 2’S 
AND w’8S, MODEL II 


Parameters 0 —.40 -60 -10 -45 -25 -80 0 0 


Least Squares Estimates 
1. Mean of Estimates —.2540 | +. 
2. Mean of Standard Errors +.0959 |+.0781 | +.0552 |+.0579 | + .0849 | + .3068 
Limited Information Estimates 
3. Mean of Estimates — .4232 |+ 
4. Mean of Standard Errors +.1232 |+ 


dication that time should be omitted is when the estimate of its coefficient is 
small relative to its standard error as in Table 399. 

When autonomous growth complicated the estimation process the addition 
of z; (Tables 395b and 396) reduced the standard errors by about 50 per cent. 
Under the conditions reported in Tables 398 and 399 the inclusion of 2s slightly 
increased them. The behavior of the standard errors when time is added in the 
model may provide a clue as to whether or not autonomous growth or similar 
complications are present in a given set of data and the model relating them. 


5. TESTS OF INDEPENDENCE OF SHOCKS AND RESIDUALS 


The Hart-von Neuman test of the ratio of the mean-square-successive dif- 
ference to the variance” has been applied tothe shocks and the observed resid- 
uals with the results shown in Table 400. 

To obtain the shocks for Model I the parameters of Biz, yu, and y12 for equa- 
tion 1, and Be», y23, and yx, for equation 2 were used in each sample of 25 to 
obtain 25 estimates of y:. For Model II, the additional parameters ys and 725 
were used. The difference between these estimates of y: generated by the use of 
these parameters ard the observed values of y: represent the true shocks to 
which the Hart-von Neuman test was first applied. 

We know from the construction of the data that both shock terms u and v2 
contained trends—the autonomous growth—and hence the hypothesis of inde- 
pendence when applied to the true shocks should be rejected when Model I is 
used. This expectation is realized at both 5 and 1 per cent levels from all 120 
samples, using a one-sided test, as is shown in Table 400. 

Unfortunately, the shocks cannot be observed except in experiments of this 
kind. When the same test of independence is applied to the observed residuals, 
it is disturbing to note that the hypothesis of independence would be rejected 
at the 5 per cent level in only 3 to 3 of the samples and at the 1 per cent level 
in less than 18 per cent of the samples, regardless of which method of estimation 
is used. Cochran and Orcutt have also observed this tendency, using LS, and 
have referred to it as a bias toward randomness in the residuals.” 


2 Hart, B. I. and J. von N , “Tabulation of the probabilities for the ratio of the mean square successive 
difference to the variance,” Annals of Mathematical Statistics, Vol. 13, No. 2 (1942), pp. 207-14. 
3 Op. cit. 


Bis Ba yu | ys vu vis ys 

+ .0300 | + .01385 
+ .2838 |+.2527 
+ .00987| + .0210 
+ .3077 | +.2647 
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When Model II is used or when the data are free from growth influences both 
shocks and residuals are expected to show no autocorrelation except as the 
chance selection of random variables may cause it. In this case, both positive 
and negative autocorrelation is found and a two-sided test was used at 10 and 2 
per cent levels. The results shown in Table 400 conform to expectation reason- 
ably well at both 2 and 10 per cent levels. The inclusion of z; in Model II seems, 
in this experiment, to free the shocks of autocorrelation. 


TABLE 400. TESTS OF INDEPENDENCE HYPOTHESIS APPLIED TO 
SHOCKS AND RESIDUALS WITH AND WITHOUT AUTONOMOUS 
GROWTH AND FOR TWO MODELS 
120 SAMPLES, n=25 


With Autonomous Growth No Growth Element 
Model I Model II Pl 
Rejected at Rejected at 
5% level 1% level 10% level 2% level 10% level 2% level 
Shocks No. Y No. % No. % No. % No. % No. % 
Equation 1 1) 100 120 100 13 10.8 3 2.5 14 33.7 3 2.5 
Equation 2 120 105 120 | 100 8 6.7 3 2.5 7 5.8 3 2.5 
Residuals 
LISE methods 
Equation 1 39 32.5 9 7.5 8 6.7 1 8 12 10.0 1 8 
Equation 2 20 16.7 5 4.2 12 10.0 2 3.7 8 6.7 1 8 
Residuals 
LS methods 
Equation 1 29 24.2 10 8.3 15 12.5 3 2.5 14 11.7 1 .8 
Equation 2 28 23.3 8 6.7 11 9.2 2 a.7 10 9.2 2 a? 


5 and 1 per cent levels for a one-sided test of the hypothesis of no positive autocorrelation. The 10 and 2 per 
cent tests are two-sided tests of the hypothesis of no autocorrelation. 


6. CHI SQUARE TESTS 


The distribution of the difference between a parameter and its estimate di- 
vided by its standard error, (6—6)/é, is usually treated as a Student ‘t’ dis- 
tribution of mean zero and variance one with n-k degrees of freedom where k 
is the number of variables in a particular equation. For the LISE estimates, 
the ¢ distribution fits in every case for Model I when no growth is present in 
the data and for Model II when autonomous growth is in the data. The ‘t’ 
distribution does not fit for LISE estimates when Model I is used on data con- 
taining autonomous growth. LS estimates of @ are sufficiently biased that the 
distribution of (6—6)/é» is skew and the ‘t’ distribution is an acceptable fit in 
only 2 of 18 cases at the 5 per cent level. There are, no doubt, other hypotheses, 
that of normality, for example, which might have passed the test equally well. 


7. ESTIMATES USING POPULATION VALUES 


Using the population variances and covariances for the y’s and 2’s, estimates 
of the parameters and standard errors were made for data with and without 
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autonomous growth. One ean think of these parameter estimates as reflecting 
estimates which would be obtained if a sample of 25 were drawn whose variance- 
covariance matrix was identical to the population variance-covariance matrix. 
Table 401 shows the estimates made by LS and LISE methods for both 
Models I and II. The standard error for each estimate is shown in parentheses 
below the estimate. 


TABLE 401. PARAMETER ESTIMATES USING POPULATION VARIANCES 
AND COVARIANCES IN A SAMPLE OF 25 


yu 


-80 


-6350 
(.2964) 
8000 


(.3173) 


-7134 
(.6095) 
-8577 
(.6285) 


C161) | (.5237) 


These results can be compared with lines 1 and 3 of Table 395a, 396 and 397 
where close agreement will be found. Table 401 continues to show that LS es- 
timates have smaller standard errors than LISE estimates. It also illustrates, 
using Model I with no autonomous growth and Model II with autonomous 
growth, hew the LISE method removes the bias inherent in LS due to the cor- 
relation of yz and the shocks. 


8. CONCLUSIONS 


Based on the results of this study, the trial inclusion of time as an additional 
term in the equations of a model seems justified when time series data are 
used. Economic analysis is best served when models are constructed of vari- 
ables which express defined economic forces. But if autonomous growth, or 
similar variations exist in the endogenous variables the estimates will be 
improved, and if autonomous growth is not present the estimates are not made 
worse on the average and the coefficient for time will be very small relative to 
its standard error. The ‘t’ term provides for the growth that would be explained 
in a larger, more complicated system of difference equations, but these larger 
systems are usually not feasible when the limited number of observations avail- 
able in time series is considered. Unfortunately, there is no test prior to making 
estimates which establishes the presence or absence of autonomous growth. 


% Other types of variation in economic time series, such as cycles, may cause shocks to be correlated with other 
variables. The inclusion of time as a linear function will not help, of course, if the correlations are not linear. 


Autono- 
Biz Bra yu vis yu 
10 45 25 m | 4.40 | 3.40 
Model I 
Ls No | —.2503 | .5068 | .1358 | .3919 | .2782 
(.0919) | (.0773) | (.0528) | (.0560) | (.0814) 
LISE No | —.4000 | .6000 | .1000 | .4500 | .2500 
(.1133) | (.0910) | (.0872) | (0633) | (.0856) 
Model I 
LS Yes | —.o189 | .9522 | .1670 | .5502 | .4120 
¢.1852) | ¢.1135) | ¢.1101) | (.1110) | (.1641) 
LISE Yes | —.2751 | .9958 | .1031 | .6760 | .3735 
(.2267) | ¢.1219) | (.1187) | (.1297) | (.1691) 
Model II 
Ls Yes | -.2503 | 5121 | | .3919 | .2717 | | 4.164 | 3.777 
(0947) | (.0792) | (.0536) | (.0572) | (.0792) | (.3036) | (.5011) | (.4707) 
LISE Ye | 
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The LISE estimates without trends in any variables and with the exogenous 
variables changing stochastically from sample to sample behave about as 
would be expected. This variation in the 2’s does not seem to disturb the esti- 
mates although the standard errors have a larger dispersion than they would 
have been had the z’s not varied from sample to sample. This extension, using 
small samples seems justified when one considers the size and type of models 
used by economists. 

Tests of independence applied to the residuals reveal the bias toward ran- 
domness due to bias in the estimates when autonomous growth is present in 
Model I. Chi square tests indicate that confidence intervals established under 
the assumption of a ‘’ distribution and tests of hypotheses made under the 
same assumption may lead to series errors of inference when autonomous 
growth is present and models similar to Model I are used. 

This study permits only limited conclusions about multicolinearity, but no 
evidence exists here to indicate that increases in correlation up to 30 per cent 
among the exogenous variables, whose correlations originally ranged from 36 
to 91 per cent, increases the standard errors. 

Last, with present computing equipment, the advantage of simplicity of 
calculation possessed by LS is greatly diminished. Using our present program, 
estimates of all parameters, their standard errors and the appropriate eigen- 
value can be obtained by LISE methods for one of our samples in about nine 
seconds on the “Illiac.” LS estimates can be obtained only a little faster. 


ON THE PROBLEM OF MATCHING LISTS BY SAMPLES 


W. Epwarps DEMING AND GERALD J. GLASSER 
New York University 


This paper presents theory for estimation of the proportions of names 
common to two or more lists of names, through use of samples drawn 
from the lists. The theory covers (a) the probability distributions, ex- 
pected values, variances, and the third and fourth moments of the 
estimates of the proportions duplicated; (b) testing a hypothesis with 
respect to a proportion; (c) optimum allocation of the samples; (d) the, 
effect of duplicates within a list; (e) possible gains from stratification. 
Examples illustrate some of the theory. 


esa of the problem. There are 2 or more long lists of names. Some 
names may be common to some or all of the lists, and it is of some economic 
or scientific importance to discover how many. The lists may be very long: in 
practice they may run to several hundred thousand or millions of names. One 
example came up in Germany a few years ago where the government wished to 
know how many people receive regular cheques from several sources—for exam- 
ple, government payroll, social security, unemployment compensation, subsidy 
of one kind or another, ex-soldier’s allowance, and possibly other sources. An- 
other example is provided by a publisher of a magazine who wished to discover 
how many of his subscribers were on a list of executives, and on other special 
lists. 

An advertising agency or the marketing department of a firm must decide 
whether to use one or both of 2 lists of names available to them for an advertis- 
ing campaign. The number of names common to both lists would be, if they 
knew it, the key decision-parameter. A firm may wish to determine, by com- 
paring two lists, how many of their present employees worked there in some 
past year. The number of shareholders common to two or more companies, 
and the number of companies that do business in each of two states, are addi- 
tional problems which require the matching of lists. Library work affords other 
illustrations. 

This paper presents some statistical theory for the solution of such problems. 
Several of the results (including estimators for relevant parameters and approx- 
imations to their variances) have already appeared in a note by Goodman.! 
Here we apply and extend his work. 

The theory that we give here, like Goodman’s, is based on probability sam- 
ples from both lists. Incidentally, a sample from one list matched against the 
other in full (100%) presents only a simple case of random sampling from a 
finite population of attributes; it is also a limiting case of the general theory of 
matching two samples. 

Notation. The accompanying table shows the scheme of notation. ai, a2, - - - , 
ay are distinct and ordered names on one list; by, be, - - - , by are distinct and 
ordered names on the other list. D names are common to both lists. We assume 
that no name appears more than once on one list; however, we illustrate later 


1 Goodman, Leo A., “On the analysis of samples from k lists,” Annals of Mathematical Statistics, 23 (1952), 
pp. 632-4. 
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the relaxation of this requirement. The number D is important: it is the number 
that the publisher (for example) wishes to know. Let 


(1) 
M 
P (2) 
It will suffice to estimate either p or P. 
List 1 List 2 
ay by 
a2 by 
a3 b; 
au by 
Number on the list M N 
Number common to both lists D D 
Proportion common to both lists Pp a 
The comparison of name a; in List 1 with name b; in List 2 gives 
= 1 if the 2 names are identical 
= 0 otherwise. (3) 


Then the number D of names common to both lists is 
ab; = D (4) 


where i in the summation runs through List 1, and j runs through List 2. Let 
names in the samples be 


2m from List 1 
Y1, Y2,***, Yn from List 2 
ry; = if the 2 names are identical 
{ (5) 
= 0 otherwise. 
We also define 
d= ry; [i= (6) 


the number of names common to the 2 samples. d is a random variable. 


Sampling procedure 


(1) Draw by random numbers between 1 and M and without replacement m 
names from List 1. 

(2) Draw by random numbers between 1 and N and without replacement n 
names from List 2. 
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(3) Compare every name in the sample from List 1 with every name in the 
sample from List 2 to discover how many names are common to both samples. 
Let d be this number. 

(4) Form the estimates 


(7) 


p (8) 


B = — d=Mp=NP. (9) 
mm 


For a problem that requires a statistical test, step 4 specifies regions of ac- 
ceptance and rejection, and not estimators. Suppose that we wish to test the 
hypothesis p = pp against the alternative p< po. The region of rejection for a test 
at level a is d<d*, where d* is an integer for which 


P{d <d*| po} =a. (10) 


The critical value d* may be determined by reference to the exact probability 
distribution of d, Eq. (11), or to the approximations afforded by either Eq. (12) 
or (13). An example appears later. 

Results for 2 lists. The estimates just formed by the sampling procedure given 
above possess the following properties: 


p is an unbiased estimate of p 
P is an unbiased estimate of P 
D is an unbiased estimate of D. 

As Goodman pointed out, these estimators may under special conditions 
lead to impossible results. For example, with N=M=1000, n=m=100 and 
d=20, Eq. (7) shows that /=2. However, the probability of an unreasonable 
estimate is generally small, unless p or P is close to 1. Thus, impossible values of 
D, p, or P may simply mean that practically all the names in the small list are 


duplicates of those in the larger list. 
The probability distribution of d, the number of names common to the 2 


which one may use to determine critical values for a statistical test and to 
compute the power of the test. Alternative forms appear later in Eqs. (42) 
and (43). 
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We introduce 2 limiting cases. Case 1: M, N, m, n all increase without limit 
in such manner that D, m/M, n/N remain fixed. Case 2: M, N, m, n, and D all 
increase without limit in such manner that mnD/MN remains fixed at the 
value X. In Case 1 


where f=mn/MN. The limit in Case 1 is obviously a binomial with parameters 
D and f. It is comparable to the binomial limit for the hypergeometric distri- 
bution.? It gives a good approximation to the exact distribution Eq. (11) if M, 
N, m, and n are reasonably large. In Case 2 

d 


P(d) (13) 


which is a Poisson distribution. This equation also approximates the exact 
distribution in Eq. (11) if f is small. In addition 


Np 
= D—1)> p? 14 
N 
Case 1 (15) 
mn NM 
N 
bs Case 2. (16) 
mn 


If the sample from List 2 is complete, then n=N, and the above formulas 
for the probability distribution of d and for Var # reduce to 


an 


the hypergeometric distribution, 1 and 


M—m pq 
= — = 18 
(18) 
Eqs. (14), (15), and (16) take the form 
mnp M-1WN-1 
N nm 
Case 1 (20) 
mnp NM 
N 
Case 2 (21) 
mnp 


2 Coggins, Paul P., “Some general results of elementary sampling theory for engineering use,” Bell System Tech- 
nical Journal, 7 (1928), p. 44. 
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where C;? is the rel-variance of p (Var # divided by p’*). Var P and CP? follow 
by symmetry. An unbiased estimate of Var is 


Est Var i} + mnM-1N-1 (22) 


(MN 
i} Case 1 (23) 
Case 2. (24) 
mn 


Est Var P follows by symmetry. For the higher central moment coefficients of 
d, put f=mn/MN and define 


m—-in-t 


Then 
E(d — = Ao{1 — 3d¢ + + + 2Ac? — (26) 
— Ao(1 — f)(1 — 2f) Case 1 (27) 
— Ao Case 2 (28) 


E(d — Ed)* = Ao{1 — + 7A; + 6Ac? + — 12A0Ai — 3Ac® + 
— + AiArAs} (29) 
— 3A0?(1 — f)? + Ao(1 — 6f + 6f?) Case 1 (30) © 
— 3Ao? + Ao. Case 2. (31) 
It will be observed that Eqs. (27) and (30) agree with the corresponding mo- 


ment coefficients of the binomial of Eq. (12), while Eqs. (28) and (31) agree with 
those of the Poisson distribution in Eq. (13). 


Examples for 2 lists 

(1) Probability samples of 900 and 1,800 are selected from lists of 40,000 
and 20,000 names. They contain 16 duplicates. Eqs. (7), (8), and (9) give the 
unbiased estimates 


20,000 16 
p= = ,198 
1800 900 
40,000 16 
900 1800 
20,000 40,000 
B= ; 16 = 7901 
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Eq. (22) gives 

= 049 

é = .098 

= 1974 


Kgs. (23) and (24) lead to practically the same numerical estimates, because 
m,n, M, and N are all big. 

(2) An advertising agency has 2 lists of names A and B, but can not use 
them as they are if too many names are common to both lists. List A contains 
40,000 names; list B contains 10,000 names. The director of the agency specifies 
that he wishes to take a risk no bigger than .01 of using the lists as they are 
if 1000 names or more are common to both lists. This number would make P, 
the proportions of duplicates in list B, equal to .1. If a test accepts the hypoth- 
esis that P may be .1 or bigger, he will purge the lists of duplicates by match- 
ing them 100%, or until tests of samples show that the duplicates have reached 
the required level. The costs of sampling the lists are equal, wherefore we select 
2000 names from each list (cf. the later section on allocation). 

Statistically, the problem is to test the hypothesis P=.1 against the alter- 
native P<.1. As M, N, m, and n are all big, and as mn/MN is small, we may 
use the Poisson approximation Eq. (13) with 

mnP _ 2000 X 2000 X .1 


A= = 10. 32 
M 40,000 (62) 


The critical integer d* is the nearest integer that satisfies the equation 


=l—a= .99 [A = 10]. (33) 

One may use Molina’s tables* to find the critical value d*, which turns out 
to be 4. The exact distribution Eq. (11) and the binomial limit Eq. (12) give 
the same critical value. An easier way is to use the square-root-transformation 
with mean equal to »/10 and with standard deviation }, noting that the area 
.01 under one tail corresponds to a standard deviate of 2.33, wherefore 


V/10 — Vd* = 2.33 X 4 = 1.165 (34) 


whence »/d* =2 and d*=4. Hence the statistical rule for decision requires re- 
jection of the hypothesis P=.1 and acceptance of the lists as they are if the 
number d of duplicates in the 2 samples of 2000 turns out to be less than 4. 

Eqs. (11), (12), or (13) give the probabilities in the accompanying table for 
the power of the test, with samples of 2000 from each list. 


P (proportion of duplicates 
in List B) .005 | .010 .025 .100 


Probability of rejection of 
the hypothesis P =.1 1.00-—| .98 -76 .26 .06 -00+ 


3 Molina, E. C., Poisson's Exp tial Bi ial Limit, New York: D. Van Nostrand, 1942, Table II, p. 11. 
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Solution for 2 lists. Let Ai, As, - - - Ap be the D names common to both lists. 
Then the probability that a specified name A; will fall into both samples is 
— 35) 
MN 


for all <. The probability that 2 specified names A; and A; will both fall into 


both samples is 
mm—-ln 2 2 
P(A;, Aj) = = = (36) 


for all i+). Similarly, for any specified set of k names, 
P(Ai, As, Ax) 
mm— m—-k+1n n-1 n—-k+1 
“MM-1 M-k+1NN-1 
m 


for any set of knames, k< D, k<m, k<n. 
To derive the distribution of d, one may apply a general rule of addition of 
probabilities.‘ Thus, if 


D mn 
= P(A,) = D IN (38) 


then the probability distribution of exactly d names common to both samples is 
d+1 d+2 D 
P(d) = Si ( a ) Seas + ( (7) 8 (41) 


whence 


4 Feller, William, An Introduction to Probability Theory and its Applications, 2d ed., New York: John Wiley and 
Sons, 1957, Chap. 4. 


(39) 
and 
(40) 
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M—k\/N 43) 


M — D\(N - 


as already given in Eq. (11). 
To derive the expected values and variances of p and of P we note that 


mn mn 


whence Ep=p and EP =P, as already recorded. Next, 
Ed? = = t+ (i’ # i,j’ ¥ 
= EV t+ ED 
nn-—l 
NN-1 
an 
NN-1 


aiayb,by 


(D? — D) 


It follows that 
Var = E(p — p)* = Ep 
N 2 
| —] #1 
mn 
m-1n-1 
M-1N-1 
as already recorded in Eq. (14). 
Extension to L lists. Let d be the number of names common to samples of 
Sizes Nz, nz drawn at random from lists of size Ni, No, Nz in 


which D names are common to all L Lists. The distribution of d, the number of 
names common to all Z samples, is 


410 
(45) 
ote 
om 
= [1+ (46) 
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and asymptotic results analogous to Eq. (12) and Eq. (13) include 


where 
L nN; 
f= I N, (50) 
A = Df. (51) 
Put now 

b=all~ (52) 


wherein 7 runs here and hereafter from 1 to L. Then D is an unbiased estimate 
of D, and 


N; nz —1 ny 
var = DIT {1+ sa) 
N; 
Case 2, (55) 


An unbiased estimate of this variance is 


i N; ~~ 1 
i} Case 1 (57) 
n; 
— BIJ a Case 2. (58) 


Optimum sample-sizes. For matching 2 samples let the costs be: 

c, to draw a name from List 1, and to write it down or to prepare a card 
therefor, in preparation to compare it with the sample from List 2. ¢ 
includes also a proper share of the cost of sorting the cards of the sample 
to put them in alphabetic order. 

C2 the same for List 2. 

cs; to compare & name in one sample with a name in the other sample, and 
to record the comparison as 0 or 1. 


5 
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Then the total cost of the job will be 
K = mc, + nez + mne; (59) 
the optimum sizes m and n being such that 
mc, = Neo [Approximately] (60) 


which means that to get the most precision for our money, we should choose 
mn big enough to yield the required precision in # or in P, and then equate the 
costs of drawing the 2 samples. To derive this result, we satisfy ourselves with 
Eq. (21) for Case 2, for which 


= N/mnp. 


This equation fixes the product mn, also the cost mnc; of matching. We now 
write 


The right-hand side of this last equation is a number, once we fix N, ¢1, ¢2, 
c;? and insert plausible value of p. We may treat mc; as one variable, ncz as 
another. If now we were to plot mc, on one axis of rectangular coordinates, and 
nc, on the other, the graph of the last equation would be a hyperbola. The co- 
ordinates of any point thereon are merely the costs of drawing the 2 samples. 
The sum of these 2 costs, and hence also the total cost K, is a minimum where 
the hyperbola meets the 45°-line mc;=nc2. If the costs of drawing names from 
the lists are equal (c;=c2), an exact result for the optimum sizes is m=n. 

For L lists, the optimum sizes of the samples would satisfy approximately 
the equations 


MC, = = +++ = (62) 


An example in allocation of 2 samples. The procedure to find the optimum 
sizes of the samples could then be this: 

1. Choose a plausible value of p. 

2. Choose the desired coefficient of variation, C-. 

3. Find mn=N/pC;?. 

4. Find m= J/mne2/c 

n=me;/C2. 

Thus, suppose that N is 20,000, and that p may be about 5%. The client says 

that C;=50% will be sufficient for his purpose. The costs, we suppose, are: 


Then 
mn = N/pC3? = 20,000/.05 K .25 = 1,600,000 
m = = 1,600,000 X = /800,000 + 900 
n = mei/c2 = 1800 
and the total cost of the job would be 
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K = me, + + 
= .50 X 900 + .25 X 1800 + 1,600,000 x .001 = $2500. (63) 


To compare this cost with proportionate allocation, we keep mn = 1,600,000, 
but before we can go further, we must assume some value for M: let M=2N, 
and m=2n, mn =2n?. Then by Eq. (21), y 


3? = N/mnp = N/2n’p 


n® = 3N/pC;* C? 
= 10,000/.05 X 5? = 800,000 
n = 895 
m = 1790 
mn = 1,600,000 as before. 


The total cost would be 
K = mc, + nee + mncs = $895 + $447.50 + $1600 
= $2942.50 
to compare with $2500 by the optimum allocation. 


We compute now also, for comparison, the cost to attain the same precision 
with equal allocation, m=n: 


(64) 


Cz? = N/mnp = N/n*p (as before, from Eq. 21) 
n? = N/pC;? 
= 20,000/.05 X .5? = 1,600,000 
n = 1265 = m. 


The total cost would be in this case 
K = mc, + nez + mnc; 
.50 X 1265 + .25 & 1265 + $1600 = $2548.75 


which exceeds only slightly the cost for optimum allocation. 

Duplicates within lists. We now drop the requirement that no name appear 
more than once on a list.5 We restrict this excursion to 2 lists, and to the possi- 
bility that some names occur twice on a list, but not thrice nor more. Let 
Dj; be the number of names that appear 7 times on List 1 and 7 times on List 2. 
Both z and j may be 0, 1, 2. Then if M’ is the number of distinct names on List. 
1, likewise N’ for List 2, and if D’ is the number of distinct names common to 
the 2 lists, then 


(65) 


M' = M — (Dx + Dun + Dx) (66) 
N’ = N — (Doe + Diz + Doe) (67) 
= Dy + Dy + Du + Deo. (68) 
5 Cf. Mosteller, Frederick (ed.), “Questi and .” American Statistician (1949), no. 3, pp. 12-3; and 


Goodman, Leo A., “On the estimation of the ber of classes in a population,” Annals of Mathematical Statistics, 
20 (1949), pp. 572-9. 
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Denote by d;; the number of names that appear 7 times in the sample from 
List 1 and j times in the sample from List 2. Then the random variables 


MM-1 
mM’ =M + da + (69) 
N N-1 


(doz + diz + (70) 


M-1N-1 M-1 N-1 
m—1 


are unbiased estimates of M’, N’, D’, and M’+N’— D’ is an unbiased estimate 
of M’+N’—D’, which is the number of distinct names on the 2 lists combined. 

Eq. (71) reduces to Eq. (9) if diz=da =d22.=0; that is, if no duplication within 
lists appears in the sample. This indicates that unless such duplication appears, 
thereby invalidating our assumption that no name appear more than once on a 
list, the theory of estimation presented up to this point is sufficient. The same 
is true in the more general cases. 

Stratified random sampling. In some applications it may be possible to in- 
crease the efficiency of the sample results by the judicious use of stratification. 
To go about it we divide each list into R strata, with the strata in one list in a 
one-to-one correspondence with the strata in the other. The theory presented 
assumes that any duplicates occur only in the corresponding strata. If the lists 
are lists of name~, we may accomplish stratification by reference to last initial, 
geographical location, or by some other criterion. 

With M;, N;, Dj, m;, n; and d; representing the appropriate characteristics of 
the i-th stratum and the sample selected from it, an unbiased estimate of D is 


(72) 


The variance of this estimate is 


MN; (m; — 1)(n; — 1) mn; 


and an unbiased estimate of this variance is 


Est Var D, = > (~) fa.(1 


imi \ MN; 


The optimum allocation of a sample of fixed size, to minimize the variance of 
B,, involves the requirement that m;=n; for all strata, and that 


R MSN; 
D, = — | 
MN; 
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=m vad fi=1,2,---,R] (75) 


R 


WD 
i=l 


where m(=n) is the size of the sample to select from all strata combined 
in List 1 (or in List 2). The accompanying table illustrates the optimum alloca- 
tion with a hypothetical example. 


Stratum M ‘ N. ‘ D; Vv iM. iN ‘ m ny 
1 100 200 12 62 27 27 
2 200 100 13 64 28 28 
3 300 350 36 156 68 68 
4 400 350 39 176 77 77 
Total 1000 1000 100 458 200 200 


Var D = 2321 _—s [by Eq. (21)] 
Var D, = 2223 _—[by Eq. (73)] 
Var D, (76) 


One requires some assumption about the unknown values of the D; in order 
to apply Eq. (75). In the absence of any other hints, we might in some applica- 
tions assume each D; proportional to the smaller of the M; and N;. 


Var D = \ 


ON VARIANCES OF RATIOS AND THEIR DIFFERENCES IN 
MULTI-STAGE SAMPLES* 


Kiso aANp IRENE 
University of Michigan 


We discuss the computation of variances for the estimators r=y/z 
and (r—r’) where the random variables (variates) y and z are sample 
totals for two variates obtained from some multi-stage design. The 
variate z often represents the sample size; then the ratio r=y/z is the 
simple and usual sample mean or proportion and a common statistic 
for presenting the results of sample surveys. 

The difference (r—r’) occurs frequently and importantly in multi- 
stage samples either as the change in the estimates of the same charac- 
teristic obtained from two different surveys, or as the comparison of 
some characteristic estimated for two subclasses from the same survey. 
Several useful computational forms are presented for var(r —r’) = var(r) 
+var(r’) —2 cov(r, r’). 

The aims of the presentation are: (a) to be general enough to cover 
the complexities which arise frequently in practical sample designs; (b) 
to provide easy computing formulas for good approximations; and (c) to 
make the procedures comprehensible to nontechnicians. 


1, INTRODUCTION 


E DESCRIBE in detail the computation of the variances for ratio estimators 

which occur frequently in the analysis and presentation of results of 
sample surveys. Large-scale surveys, especially in the social sciences, are usually 
complex, multi-stage samples. The estimator used most frequently for present- 
ing means, proportions and many aggregates is a ratio estimator of the kind 
described here. 

The reader can find detailed discussions of ratio estimators in journals and 
in several books on sampling [2, 8], especially in those by Cochran [1], by 
Hansen, Hurwitz and Madow [4], and by Sukhatme [7]. We believe, however, 
that these are not readily understood by many research scientists who conduct 
sample surveys and analyze sample data, but who are not specialists in sam- 
pling; hence, the many instances when estimates of sampling errors are not com- 
puted, or only the roughest approximations are made. Frequently, these ap- 
proximations are poor and misleading; in particular, it is common practice to 
misuse the variance formulas of simple random sampling to approximate the 
variances of estimators obtained from multi-stage samples. That misuse results 
frequently in gross distortion of the true standard error, and it deprives es- 
timators of the desired probability levels of protection [6]. 

Our principal objectives for this article are given in the last paragraph of the 
summary. To be general enough to cover most practical sampling situations, 
while using only simple computational formulas, presented us with conflicting 
challenges. We hope that we met them without serious theoretical flaws. How- 
ever, our goal was not theoretical perfection but help for the non-specialist 


* A revised version of a paper contributed to the 115th Annual Meeting of the American Statistical Associa- 
tion in New York on December 29, 1955. It benefited from a special “off-campus” assignment for research granted 
in 1958 by the Survey Research Center, University of Michigan to the senior author. As a Visitor in the Department 
of Statistics at Harvard University he received helpful suggestions from William G. Cochran. 
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researcher who may wish to use our results. While we could not make this paper 
into “light reading,” we do expect that the researcher with a specific problem 
will be able to follow our directions and apply them to his problems. These 
expectations have been justified by results with these methods in and out of the 
classroom. The numerical illustrations should help. 

Another aim of this paper is to present computational forms for the variances 
of the differences between (or comparisons of) two ratio estimators as they occur 
in multi-stage samples. The formulation of these variances is not available ez- 
plicitly in the literature, to the best of our knowledge. This is a most important 
and urgent problem for the statistical analysis of survey data. 

In sections 2-6 we develop the statistica! model to serve as an approximation 
to the complexities of many practical sitiations—making our model general 
and flexible, but keeping it simple. In sections 7-10 computational forms are 
illustrated with numerical examples. In sections 11-13 three actual survey de- 
signs illustrate uses and modifications of the model. In sections 14—15 the der- 
ivations of the variance formulas are outlined. 


2. CONDITIONS OF THE SAMPLE DESIGN 


We assume a design subject to five conditions; whenever these exist, the 
formulas given for the variances provide good, simple and useful approxima- 
tions for them. We judge that conditions (a), (b) and (c) occur frequently in 
sampling practice. While conditions (d) and (e) are seldom satisfied exactly, 
they can usually be approximated usefully. These conditions define the area 
of applicability of the formulas presented in this paper. That area is, we be- 
lieve, broad and important for multi-stage sampling in general and especially 
for social research. 

(a) The variance is required for the ratio (denoted by r=y/z) of two random 
variables, y and x; these are sample totals (defined in section 3) for the variates 
“y” and “x” respectively. Or the variance is desired for the difference (r—r’) of 
two such ratios (section 5). 

The word “variance” is used throughout this paper to denote a computed 
value of the estimator of the “true” (parameter) variance. We avoid the ques- 
tions of optimum design: when should this ratio estimator be used? We accept 
the fact that it is used most frequently on socio-economic surveys; hence, a 
good estimate of its variance is needed. The merits of this “combined” or “over- 
all” ratio estimator and of its competitors are discussed in sampling books 
[1, pp. 129-134; 4, Vol. I, pp. 189-200]. It is relatively simple to compute and 
usually has little bias; very often it is a desirable estimator. 

(b) For each of the primary selections (PS’s) self-weighting estimates are 
provided for both the “y” and “zx” variates. This condition (given in 3.5) is 
usually simple enough to satisfy if precautions are taken to code PS identifica- 
tion numbers for the individual observations. 

Within these conditions, the selection of the various sampling units may be 
done either with equal or with variable probabilities (such as proportional to a 
measure of size). Moreover, any “weighting” of the observations needed to 
compensate for unequal probabilities of selection uses the same weights that 
are necessary to produce the ratio estimator itself (see 3.3b; and 3.3c). 


e 
: 
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(c) The number of primary selections (PS’s) used is large enough: (1) for 
the valid use of the “combined” ratio estimator; (2) for the valid use of the es- 
timator of the variance obtained by Taylor approximations; and (3) for the 
variance to be useful in the construction of approximate confidence intervals. 
For the meanings of these limitations the reader may refer to Cochran [1, pp. 
117-9 and 127-34], and to Hansen, Hurwitz, Madow [4, Vol. II, pp. 107-13]; 
an additional problem presented by the difference of two ratios is discussed in 
section 6. These conditions are often satisfied well enough and should be part 
of the sample design. Their disregard can lead to troubles which are prior to and 
more important than the lack of a precise-estimate of the variance. As a rough 
rule of thumb, we suggest: The selection procedure should be such that pref- 
erably no PS contain more than 24% of the sample. This calls for the selection 
of about 40 PS’s or more. This criterion might be relaxed, under pressure, to 5%. 
When unable to satisfy this criterion consult the literature or consult a tech- 
nician. 

(d) All selections are made with random choices and with replacement. This 
condition is necessary to keep the formulas and computations of the variance 
relatively simple, based only on the sample PS totals and not requiring the 
computation of the separate components of the variance. Some modifications, 
involving sampling without replacement, are feasible and practical; these are 
discussed in 4a and 4b. 

(e) Two or more P§’s are selected independently from each stratum. Some 
designs, frequently preferred and widely used, do not conform to this con- 
dition; in these situations the model of the computational design should be 
fitted with care as an approximation to the actual sample design. Some of these 
modifications are: 

(1) Actually only one PS is selected from each stratum; then pairs (or groups) 
of strata are “collapsed” to simulate two or more P§’s per stratum (see 
4b). 

(2) The PS’s are selected with systematic selection (see 4c). In this case the 
variance can be calculated by “looping” pairs of the “collapsed” and 
“implicit” strata. Here, as in (1), the gain due to the additional stratifica- 
tion of the PS’s is disregarded (see 15.2 and 15.4). 

(3) The selection in each stratum is not independent, because of the use of 
“two-way stratification” or “controlled selection.” The consequent re- 
duction of the variance is disregarded. 

These modifications of (d) and (e) generally result in designs which have some- 
what smaller variances than the calculated variances wil! tend to show. The 
discussion of the choice between either sacrificing some available precision for 
the estimator r, or accepting an exaggerated estimator of the variance of r does 
not belong here. We accept the situation that these modifications are practiced 
widely by competent researchers and statisticians. 

We aim in this paper to provide simple computational procedures for dif- 
ferent multi-stage sample designs which are actually widely used; we do not 
treat the problems of choosing an “optimum” design. An important practical 
virtue of sample design is simplicity, so that the design can be fulfilled and the 
variances (and other desired statistics) computed properly and economically. 
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Deming’s fine recent paper [3] presents selection procedures designed specif- 
ically to be simple, efficient and valid. We agree heartily with Deming [3, pp. 
34-9] and Keyfitz [5] on the advantages of two independent selections from 
each stratum; in section 7, especially in part 7b, we present this simple computa- 
tional model. A modification is discussed in 4a, and a useful approximation in 
4b. A useful alternative to this sometimes is systematic selection, treated in 4c 
and section 8. 


3. THE RATIO ESTIMATOR 


The population is divided into LZ strata. Within the g-th stratum there are 
M, primary sampling units (PSU’s) from which m, primary selections (PS’s) are 
made. We assume that m,>2, and that all selections are made independently 
and with replacement; sampling without replacement is discussed in section 4. 

The population total for the “y” variate is denoted as 


L 


LM 
(3.1) 


where Y, is the population total for the g-th stratum and the sum of the M, 
quantities Y,, which are the PSU totals in the stratum. 
The sample total for the “y” variate is 


Lm 

(3.2) 
ook 
Here y,» is the sample total of the “y” variate for the h-th PS in the g-th stratum; 
in the example in Table 100, the sample totals for the primary selections 
are the 20 entries in Col. 2, and their sum is y= 50. In common “self-weighting” 
samples the y,, are obtained from the individual observations as simple sums 
for the PS: 


Ngh 

Yor = > Yoni (3 .3a) 
where y,,; is the “y” variate for one of the n,, observations in the gh-th PS. If 
the sample is not self-weighting, and the weighting is done by reproducing 
punched cards, then the above represents the sum of the cards for the PS. In 
other instances, the weights w,,; (these may be numbers inversely proportional 
to the over-all selection probabilities) are assigned to the observations yni 
and the PS total is obtained as: 


h 
Yor = > Wohi ghi (3 .3b) 


Sometimes the weight w,, may be assigned to the PS; then its total is obtained 
as: 


tigh 


Yoh = Wer D> Yoris (3.3c) 
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For practical work we need to have a great deal of flexibility in the sample 
design. We can have that so long as the nature of our estimates are of the kind 
in which 

E(Fy) = Y and E(Fy,) = Y, (3.4) 


where £ is the operator “expected value” of statistics. F is the “inflation factor” 
which is uniform over the entire sample; in “self-weighting” samples it is com- 
mon practice to have F = 1/f, where f is the uniform, over-all effective sampling 
rate. However, / may be any convenient constant of proportionality for the 
entire sample such that the relations 3.4-3.5 are satisfied jointly. (Such a con- 
venient constant is illustrated in section 13.) 

The relations 3.4 follow from the fundamental condition (2.b) we assume for 
the sample design and which we may state now as 


E(m,F yon) = Yo. (3.5) 


In other words, from the g-th stratum, m, PS’s are selected into the sample. 
For each PS the statistic y,, is prepared; each y,, is an unbiased estimate of 


Y,/Fm,; and 
> Yoh 
h 


is the estimator of Y, in the estimator y. Throughout this paper each of the 
m, selections is treated as a separate PS in the computations even when two 
(or more) of these selections happen (with replacement) to come from the 
same actual PSU. These are important conditions because they permit the use 
of variance computations which are both generally applicable and relatively 
simple. They hold well in most surveys. (Note that a great variety of sampling 
methods may be used within the different PS’s.) They are necessary and 
sufficient to justify the proposed variance formulas. 

Statements similar to these (3.1-3.5) about the “y” variate apply to the “x” 
variate; the 20 values of x,, for our example are given in Col. 3 of Table 100 
and the sample total is z= 193. Now we define the ratio estimator: 


Lm L mg 
=L¥m/ (3.6) 
go o 
This is an estimator of the sige’ 


2p? Ya / (3.7) 


The variates y’,, and x’,, in columns 4 and 5 of Table 100 are the 20 PS values 
for another similar ratio r’. The variate “x” is often simply the count of the 
elements; in that case r is the sample mean of the variate “y” per “x” element; 
then x,,; takes the value 1 for every element in the sample. Of course, this may 
refer not only to the entire population included in the survey, but to some 
specified subclass (“domain”) subjected fo analysis. Moreover, if “y” is a 
binomial variate, such that y,,;=1 denotes the subclass in “x” with a specified 
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characteristic (and y,,:=0 for all others), then r is the sample proportion in 
“xz” with the characteristic. This is the case in our example, in which r= 50/193 
= .259 is such a proportion. 


4. SOME USEFUL MODIFICATIONS OF THE SAMPLE DESIGN 


a. Sampling Without Replacement. In many practical situations it may be 
preferable to select PSU’s without replacement; one may select two or more 
PSU’s from each stratum with equal probability and without allowing any of 
them to be selected twice. This reduces some components of the total variance 
by the “finite population correction”; but this factor is disregarded when our 
simplified formulas are used. However, taking it into account would require the 
additional computation of other components of the variance and the use of 
difficult formulas; and the neglect of this factor generally results in a slight over- 
estimation of the variance. In most cases a better approximation results from 
using the factor (1—f) or (1—f,)—if these are not negligible [1, pp. 223-5; 8, pp. 
226-9]. Suppose that one is anxious to retain an unbiased estimate of the 
variance—but also wants to sample elements without replacement—yet retain 
simplicity of variance computations. This could be achieved in ordinary multi- 
stage sampling as follows: Select m, PS’s from the g-th stratum with replace- 
ment in all stages. But if the same last stage unit is selected on two (or more) 
draws, retain only one of them and make a substitute selection for the other. 
In the variance computations treat each of the m, selections as a separate PS. 
Then use the “finite population correction,” (1—f), where f is the uniform, over- 
all sampling fraction; or, if this fraction varies among the strata, use (1—f,) 
for the g-th stratum; often this factor is negligible. This procedure is designed 
to be the equivalent of dividing the g-th stratum into M, final sampling units 
from which m,=f,M, are selected at random without replacement. 

Selecting two or more PSU’s without replacement and with varying probabil- 
ities involves difficult technical problems that we shall avoid here. 

b. One Sample PSU per Stratum. These technical difficulties lead to the 
frequent practice of selecting a single PSU from each stratum. An exact measure 
of variance must then be abandoned in favor of an approximation based on 
“collapsed strata,” or “grouped strata” [4, Vol. I, p. 400, p. 422; 1, pp. 105-6;7, 
pp. 399-404]. For example, consider the selection of 2L PS’s from a population. 
First divide the population into L strata; then divide each of these into two 
strata, usually of roughly equal size. Now, from each of these, select a single 
PSU. The variance computations utilize the LZ “pseudo strata” and treat the 
pair of selections from each as coming from the same stratum. This approxima- 
tion results in overestimating the variance in proportion to the effectiveness of 
the last step of stratification. In many practical situations this overestimation is 
accepted as unimportant and preferable to sacrificing whatever advantage the 
last step of stratification yields. This generally will be small compared to the 
effects of the LZ strata which are properly represented in the variance com- 
putations. 

The use of groups larger than two in “collapsed strata” would decrease (in 
most cases) somewhat the variability of the variance estimate but it would 
increase its overestimation; it would also sacrifice the computational simplicity 


. 
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of two PS’s in each stratum (section 7). Often, these several advantages can 
largely be preserved, by using the (2L—1) “linked” comparisons of systematic 
samples (section 8). 

c. Systematic Selection of PSU’s. Systematic sampling serves often as a prac- 
tical method for selecting PSU’s with probabilities proportional to measures of 
size. If the total of M PSU’s in the population have a total measure of 


M 
P hy 
h 
then L’ PSU’s are selected by using an interval of 
1 M 
— Pri. 


(For a reasonable comparison, the number of PS’s in this section and in the 
preceding, should be equal; hence, L’=2L.) The joint effects of ordering and 
selecting in many practical situations amount to approximating L’ “implicit” 
strata with one random selection from each [1, pp. 179-83; 4, Vol. II, pp. 503-12]. 
The variance is approximated generally with one of two procedures. In the 
first, pairs of successive selections are regarded as coming from the same 
“pseudo-stratum” (L’/2 in all); the variance computations (based on L’/2 
differences) follow those mentioned in b, and illustrated in section 7. In the 
second procedure all the (L’—1) successive differences, “linked” as 1-2, 2-3, 
-++,(L’—1)—L’, are used in the variance computations (section 8). 

In the above we mentioned successive selections, but actually the pairing can 
be done in several ways obeying the criteria: 1) Pair strata which are alike, so 
that the pairing will not result in large overestimation of the variance. 2) The 
pairings should be based on the strata, and should ignore (preferably by pre- 
ceding) the random results of selection. These criteria are relevant also to the 
selection method discussed in b above. 

d. Varying Numbers of PS’s from the Strata. The selection of two PS’s from 
each stratum is treated in section 7. The selection of a constant number m., 
greater than two, of PS’s from each stratum is illustrated in detail in section 9. 

Variable number m,>2 of PS’s per stratum can be treated with formulas 
14.11 and 14.12 as simple modifications of the methods in section 9. However, if 
one can establish a few “stratum groups” such that the method of selecting PS’s 
is the same for all strata within each group, but varies among the groups, it 
can be handled with formulas 14.16. This is illustrated for two “stratum groups” 
in section 13. 

e. Creating Pseudo-PS’s for Variance Calculations. For computing variances 
one may create “pseudo-PS’s” from the PS results. This procedure should fol- 
low the stratification as well as the clustering of the sample design, so that the 
actual variance will be reflected accurately in the computations ([4, Vol. I, pp. 
440-4] and section 12). For example, six sample PS’s selected randomly from 
one stratum can be grouped at random into two “pseudo-PS’s,” each composed 
of three PS’s. As another example consider that a city forms one of the strata of 
a national sample and that 10 blocks have been selected systematically from 
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the city; then the selected blocks 1, 3, 5, 7, 9 can compose one “pseudo-PS” and 
blocks 2, 4, 6, 8, 10 the other; the contrast of the two “pseudo-PS8’s” yields an 
estimate of the variance contribution from the city (stratum). (This is illus- 
trated in section 7.) Creating pseudo-PS’s may be useful in any of the following 
situations: 

1) The sample size of the PS is too small for acceptable variance estimates. 
This may be true either for all strata or only for some of them. It can 
occur in dealing with subclasses (“domains”) of the sample, even when 
the entire sample is large. 

2) It is desired to reduce the computing work by using fewer differences. 

3) When constructing “pseudo-strata” (as in b above) there may be an odd 
PS left over; this can then be combined with another PS into a “pseudo- 
PS” to keep the computations simple. 


5. THE DIFFERENCES AND MULTIPLES OF RATIO ESTIMATORS 


Several important statistics can be derived simply from ratio estimators. 
a. The Difference of Two Ratios. As for any two random variables, it is true 
for r and r’ that 


var(r — r’) = var(r) + var(r’) — 2 cov(r, r’) (5.1) 
where each of these terms is an estimator term by term of 


E[(r —r’) —-(R- 
= E(r — R)? + E(r’ — R’)? — 2E(r — R)(r’ — R’). (5.2) 


Here, r is the estimator of the parameter R and “var” and “cov” stand for “the 
variance of” and “the covariance of” respectively. 

The two ratio estimators r and r’ are being compared. For estimators ob- 
tained from independent samples the term cov(r, r’) is zero; but in multi-stage 
sampling, where the same sampling units are used to obtain both estimates, it 
may be an important term. Three common situations arise for which the same 
computational formulas can be used; in each of these the covariance term arises 
from the use of the same PS’s to obtain the two ratios being compared. 

1) Two periodic survey samples are obtained from the PS’s and the results 
are compared. For example, for two successive yearly samples, the pro- 
portions of car buyers are compared. This may be done either for the en- 
tire sample or for a subclass, such as an occupation class. 

2) Two subclasses of the same sample are compared for the same character- 
istic. For example, the proportions of car buyers are compared for two 
occupation classes of the same sample. 

3) For the same sample base two characteristics are compared. For example, 
one can compare the proportions of new car buyers and used car buyers. 
In this case =z’ and (r—r’)=(y/x—y’/x) =(y—y’)/xz; the general 
formulas can be used, but a modification is also available for this special 
case (14.4). 

b. Multiples of Ratios. If C is any known constant, then the variance of the 

statistic Cr can be computed simply as: 


i 
| 
\ 
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var(Cr) = C? var(r). : (5.3) 

Similarly, one can compute the variance of the difference (Cr—C’r’) as: 
var(Cr — C’r’) = C? var(r) + C”? var(r’) — 2CC’ cov(r, r’) = (5.4) 


These statistics arise frequently in the following manner. 

1) The known constant C is the population total X, and Xr=Xy/z is used 
to estimate the population total Y. For example, the total assets (Y) of a 
population might be estimated by using the Census count of families 
(X) to “inflate” a sample estimate (r= y/z) of assets per family. 

2) The known constant C is the population average X = X/N and Xr=X(y/z) 
is used to estimate the population average Y=Y/N. For example, the 
average assets per family (Y) of a population might be estimated by using 
a reliable Census estimate of income per family (X) to multiply a sample 
estimate (r= y/z) of the ratio of family assets to income. 

Note, first that bias may arise insofar as the operations of measuring 
X or X actually differ from the measurement operations for “x” in the 
sample. Secondly, if C is not a constant, but a variable (c) subject to error, 
formula 5.3 is replaced by [4, Vol. 1, pp. 513]: 


var(er) = c?[var(r) + + 2— cov(c, r)]. (5.5) 


If the sample of c is independent of the sample of r, the last term vanishes. 
Insofar as c is found to be based on a much more accurate (larger) sample 
than r, the second term may be neglected, and 5.3 used as a practical ap- 
proximation of 5.5. 


6. SIMPLE VARIANCE FORMULAS 


a. Estimating the Variance. In each of the next three sections we present a 
different model for computing the variances of r and (r—r’). We hope that non- 
specialists can learn to use these formulas by working the numerical illustra- 
tions of these sections. The three models represent the different sample designs 
encountered most frequently on surveys; and a good approximation to most 
surveys can be found among them—perhaps with the aid of a technician. In 
each model the number of PS’s per stratum is held constant throughout the 
sample; for treatments of variable PS’s, see 14.11, 14.12 and 14.16. 

For brevity’s sake one set of data is utilized to illustrate the three models. 
These yield different answers due chiefly to sampling variability. The data 
(Table 100) were obtained in the Detroit Area Study of 1954-1955. The PS of 
which there were 82, was the Census tract. In the illustrations the 82 PS’s have 
been combined into 20 “pseudo-PS’s”; in the computations these are treated as 
20 PS’s. The four sets of PS totals in columns 2 through 5 comprise all the data 
we need; in columns 2 and 3 are shown the PS totals of the “y” and “zx” variates, 
respectively, that go into forming the first ratio r=y/z; in columns 4 and 5 are 
the similar PS totals that go into the second ratio r’=y’/x’. Each of the two 
ratios represents a proportion having a certain characteristic based on a sub- 
class of the entire sample. These ratios are found to be 
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/ >>> Lon = —— = .259 


193 


73 
= va’ / = 427. 
For each model we present the variance of the difference of two ratio estimators 
in terms of the variances of the two ratios and their covariances: 


var(r — r’) = var(r) + var(r’) — 2 cov(r, r’). (5.1) 


The first two terms are usually wanted separately for their own sakes; when 
they are not needed, another computational form (14.13) may be used. 

For each model two alternate computational forms (a and b) are presented. 
These give the same results, except for rounding errors, and can be used to check 
each other. One form appears in terms of the variances and covariances of x 
and y; the other form is in equivalent terms of the useful computational vari- 
ate “z”; this can be defined in terms of the basic PS values as z,=Ygr—Tr2gn. 
The proper computations define var(z) =[var(y)+r? var(x)—2r cov(y, zx)]; 
but Z remains undefined. The computational forms are derived in sections 14 
and 15. Formula 5.1 is represented, term by term, in the “z formula” (14.12), 
thus: 


1 1 z 
var(r — r’) = —-var(z) + — var(2’) — — cov(Z, 2’). (6.1) 
x? x’? zz’ 


The equivalent of both 5.1 and 6.1 is given, term by term, for the “d formula” 
(14.11) by: 


var (r — r’) = [var(y) + r? var(x) — 2r cov(y, 
+ =. [var(y’) + r”? var(z’) — 2r’ cov(y’, x’) ] (6.2) 


2 
[cov(y, y’) + rr’ cov(z, x’) — r’ cov(y, x’) — rcov(y’, x) 
For cases of “equal cluster sizes” (that is x,,=2,.) the variances and covari- 
ances involving “x” vanish; and 6.2 reduces to 


var(y) + var(y’) 2 cov(y, y’) 


var(r — r’) = 

rx 

b. The Bias of the Ratio Estimator. In using ~/var(r) for statistical inference 
one must be aware of the bias of the ratio estimator relative to the standard 
error. This has been shown |1, p. 118] to be less than the coefficient of 
variation of x, which is var(x)/x and decreases with the number of sampling 
units. For samples using moderate to large numbers of sampling units the bias 
will not be serious. However, in judging the importance of the bias in the ratio 
estimator (r—r’) of (R—R’) in relation to its standard error, /var(r—r’), an 
additional problem may arise when two conditions exist: 1) The covariance term 


425 
= 
x 
and 
| 
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—2 cov(r, r’) causes a large reduction in var(r—r’); and 2) the biases in (r—R) 
and (r’—R’) are very dissimilar so that their difference has a large bias. 

These conditions will not exist in two kinds of cases. First, if r and r’ are in- 
dependent, the covariance term vanishes. Secondly, if both ratios are based on 
the same denominator (14.4), this problem will not arise. However, in other 
conditions, we should be aware of the problem. First, one should look at the 
term, —2 cov(r, r’): if it is small the problem is not serious. A tentative rule 
of thumb might be that, if 2 cov(r, r’) is larger than the smaller of var(r) 
and var(r’), then the bias should be investigated. 

Fortunately, for this investigation the bias of r can be approximated (14.17) 
by [r var(x) —cov(y, x) ]/z*. Hence one should compare the bias in (r—r’) ex- 
pressed as: 


[r var(x) — cov(y, x)] — [r’ var(x’) — cov(y’, 2’) ] 


with the standard error, »/var(r—r’). Note that all the terms in the bias appear 
in the computations of the variances using the “d formulas” (14.11). Finally, 
to judge the importance of the bias in relation to the standard error, consult 
Cochran [1, p. 9] and Hansen, Hurwitz, Madow [4, Vol. I, p. 58]. The latter is 
particularly relevant because it uses the fact that the formulas for var(r) and 
var(r—r’) are actually approximations for the mean square error [4, Vol. II, 
p. 108; 7, p. 153] rather than the variance only. Thus, even a bias as large as 
one-half of the standard error has little effect on two-sided confidence intervals. 
In most cases, this limit would not be exceeded, we believe. In our computations 
the bias has been but a small fraction of the standard error. 

c. Comparison with Simple Random Sampling. It is often useful to compute 
an estimate of the variance that a simple random sample (s.r.s.) of the same size 
would have yielded. First, it can serve as a check against large blunders; this 
check depends on the knowledge one develops about the range within which the 
ratio of the actual variance to s.r.s. variance tends to lie for a given design and 
material. Secondly, the relative regularity of that range facilitates extrapolation 
from computed to uncomputed variances; this aids in designing samples for new 
variables. For binomial variables, which occur often, the estimation of the 
equivalent variance is particularly easy, since it is simply: 


r(l—r)_ y(t —y) 
—1) 


s.r.s. var(r) = (6.3) 
In our example 
50(193 — 50) 
s.r.s. var(r) = ————————- = 9.997 x 10 
193?(192) 
and 

.r.s, var(r’) = = 14, 

171°(170) 
The sum of the two, 24.39 10-4, can be compared to var(r—r’) because the 
two proportions are based on mutually exclusive groups. 
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On the other hand, if the two proportions have the same base the approxi- 
mate s.r.s. variance is: 


var = 


x n—1 x(x — 1) 


7. VARIANCES FOR MODEL I: TWO PS’S PER STRATUM 


In this model, two PS’s are selected from each stratum; the computations 
are based on the special convenient formulas developed in 14.11, 14.12 and 
14.15. Here the number of strata L = 10; the PS’s are grouped into 10 pairs and. 
the pair of PS’s from each stratum are denoted by a and b. From each of col- 
umns 8-13 of Table 427, we use only the deviations arising within the 10 pairs, 
omitting the other 9 rows of figures, which are in parentheses. The 10 values 
represent the 10 strata as g takes on the values 1, 2, - - - , 10 successively. We 
omitted as cumbersome the g subscripts from the deviations (the d units) in the 
longer formulas, and hope that no confusion results from this. 


TABLE 427. SAMPLE DATA AND BASIC CALCULATIONS ILLUSTRATING 
VARIANCE ESTIMATION FOR THREE SAMPLE SELECTION 
MODELS AND TWO COMPUTATIONAL TECHNIQUES 


Sample Data Derived Computational Units 


Differences for Models I and II 


dig 


12 


2.482 
(—0.036) 
—1.482 

(2.223) 


—0.295 
(—2.295) 
—0.223 

(0.482) 


—0.482 
(—1.000) 

1.223 
(1.259) 


—2.000 
(—1.223) 

2.741 
(1.741) 


—2.482 
(—0.223) 
1.813 


[0 .378} 


Number 

Yoh | | Ugh’ | Zr’ dyg | | | dz,’ |_| dz,’ 

1 S12. é 5 6 7 8 9 10 11 = 13 
1 1.373 | —2.988 3 2/ -3 -1.719 
2 | 3./10 | —1.100 | —1.269 (—1.135) 
3 1 eis 5 | —1.073 | -0.185 | -2 | -2] -4] -5 —1.865 
4 3 |10 | 6 | 10 0.409 1.731 (3)} @)| © (4.000) 

Subtotal | [11)} [44]] [11]] [32]] [—0.400}] [—2.661] 
5 2 |10 | —1.813 | -2.209 | -1| -5 | -8 |] —2.865 

6 1 | 12 7 |15 | —2.109 0.596 |(-1)] (5) (—9) | (-9) (—5.158) 

7 2 7 116 | 24 0.187 5.754 | -1 | -3 13 17 5.742 
8 0.409 | +40.012 (@)} (-3) (1.281) 

Subtotal | [6]}  (36]} [56]] [—3.326]] [4.093] 
9 2 8 3 | 10 —0.072 | —1.269 | -1] -2 2 5 —0.135 : 
10 3 | 10 1 5 0.409 | —1.135 | (-1)]| (0) | (—4) (-3) (-2.719) 

11 4inisis 1.409 1.585 2 3 1 0 1.000 

12 0.187 0.585 (0.865) 

Subtotal | [11]} [13}]} [31]]  [1.933]} [—0.234] 
13 1 8 1 3 -1.073 | -0.281 | ~2.292 

14 0.927 2.012 | (—2)/(-3)} (3.573) 

15 5 | al 1 6 2.150 | —1.561 3 -1 —0.573 

16 2 | 3 7 —0.591 | —0.988 @ (0.146) 

Subtotal | [11]] [37]] [23]]  [1.413]] [—0.818] 

17 9 1 & | —2.332 | -1.135 | -3 | -2 -1 0.427 

18 3/11 1 6 0.150 | —1.561 | (—1) | (-—3) | (—6) | (—7) (--3.012) 

19 4 | 14 7 | 13 0.373 1.450 0 7 4 8 0.585 

20 sists 5 2.187 0.865 : 

Subtotal | [41]] [12]} [29]} [—0.381) 

Total | 50 {193 | 73 {171 
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Form (a). The “d formula” (details derived in 14.11 and 14.15) utilizes the 
deviations in colurans 8—11 between pairs of PS totals, a and b, in the 10 strata. 
The reader may verify that the entries of column 8 are obtained from those of 
column 2;i.e.,5—2=3;1—3 = —2, ete. Columns 9, 10 and 11 are derived similarly 
from 3, 4, and 5. One first computes the L sets of deviations for “y,” “x,” “y’,” 
and “x’”: 


dyg = (You — Yos) and dx, = — Xp). 


The estimated variance of r is computed from columns 8-9 as: 


L 
var(r) = =| > dy? + r? >> dx? — 2r > ays | (7.1) 


var(r) = [42 + (.259)2(109) — 2(.259)(35)] = 8.371 x 10-. 


(193)? 
The reader may verify from column 8 that 
3? + (—2)? + (—1)* + (—1)? + (= 1)? + 2? + (—2)2 +3? + (—3)? +08 = 42. 
Also, from columns 8 and 9 that 
3X 2 + (—2) (—2) + (-1) (—5) + X + XK 
+2x*3+(-2) X0+3 X1+4(-3) X (-2) +0 X7 = 35. 


From columns 10 and 11, we compute: 


var(r’) = mar [257 + (.427)2(455) — 2(.427)(334)] = 18.724 x 10-+. 


The covariance term is computed from columns 8-11 as: 
2 ess L L L 

2 cov(r,r’) = =| > dydy’ + rr’ — dydx’ —r | 
rx 


2 


= (93) + (.259)(.427)(25) — (.427)(—8) (7.2) 
— (.259)(14)] = — .877 xX 10-4. 
Finally, 
var(r — r’) = [8.371 + 18.724 — (—.877)] x 10-4 = 27.97 x 10-4 


Form (b). The “z formula” (details derived in 14.12 and 14.15) uses the quan- 
tities dz in columns 12-13. There are two alternative procedures for obtaining 
these quantities. One can use the dy and dz of columns £=11, thus: 


dz, = dy, — rdx, and dz,’ = dy,’ — r’dx,’. 
In the first row we have for g=1 
dz, = 3 — .259(2) = 2.482 and dz’ = — 3 — .427(—3) = — 1719. 


This procedure has the advantages of somewhat fewer and shorter computa- 
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tions and a simpler system of checks. The other procedure (described below) has 
the advantage of utilizing units which are perhaps simpler for nonspecialists to 
comprehend and to check for crude errors. These units may be thought of as 
deviations in the y,, values from what could be “expected” according to the 
average ratio r. These units are computed directly from columns 2-3 and from 
4-5, respectively, for columns 6 and 7, thus: 

Zgh = Yoh — and = Yon’ — 


From the z’s of columns 6 and 7, respectively, the dz’s of columns 12 and 13 are 
formed: 
dz, = 2.0 — 2 and dz,’ = Zga' — 2g’. 

For g=1 and for h=a we have in the first two rows 

Zio = 5 — .259(14) = 1.373 and a,’ = 0 — .427(7) = — 2.988 

2» = 2 — .259(12) = — 1.109 and zy’ = 3 — .427(10) = — 1.269. 
Thus, for g=1 we have from the first two rows: 

dz = 21 — 2» = 1.3873 — (—1.109) = 2.482 
and 
day’ = 21.' — 2p’ = — 2.988 — (—1.269) = — 1.719. 

After the L pairs of quantities dz and dz’ are obtained by one of the two alter- 


native computing procedures, they can be used directly to compute from col- 
umns 12-13: 
var(r) = > [(2.482)? + (1.482)? + - - + (1.813)?] 
x? (193)? 
= 8.371 
var(r’) = > dz’? = [(1.719)? + (1.865)? + - - - + (.585)?] 
(171)? (7.3) 


18.724 X 


3 £ 2 
2 cov(r, r’) dzdz’ = (193) (171) [(2.482)(— 1.719) 


+ (—1.482)(—1.865) + - - + (—1.813)(.585) ] 
— 877 X 10-. 


Finally, 
var(r — r’) = [8.371 + 18.724 — (—.877)] x 10-4 = 27.97 X 10~. 


8. MODEL II: SYSTEMATIC SELECTION OF Ps’S 


In this model, L PS’s are selected systematically; from these the (L—1) suc- 
cessive differences are formed and used in the computations. These resemble the 
computations for Model I and the definitions of the quantities yg,, rg, Zan and 
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dy, dx, dz are similar; see section 7 for the arithmetic details of computing the 
d’s and the 2’s for columns 6-13 of Table 427. 

The computations of Model II differ from those of Model I in using the mul- 
tiplier L/2(L—1) and (L—1) differences. In our example (L—1)=19, and 
these differences are comprised of the 10 used in Model I plus the nine others 
shown in parentheses in columns 8-13. Thus, the subscript g takes on succes- 
sively all values from 1 through 19. (The formulas are derived in section 15.) 

Form (a). The “d formula” (details derived in 15.3) utilizes all the (L—1) =19 
differences of columns 8-11. In general, we first compute the (L—1) sets of dif- 
ferences for g=1,2,---,(L—1): 


= (Yo — Your) and dry = — 241). 


From columns 8-9 we compute the variance: 


var(r yt+r r ydx 


+ (.259)"(183) — 2(.259)(55)] = 6.893 10-4 


var(r) = 219) 


Similarly, from columns 10-11: 
1 20 
r’) = ——:—— [433 427)?(658) — 2(.427)(495) | = 23.450 xX 10-4. 
var") = + (427)*(058) — 2(.427)(495) x 


The covariance is computed from columns 8-11 as 


2 L L-1 L-1 
L-1 
— dydx’ — r dy az | 8.2) 
Zcov(r,r’) = : [25 + (.259)(.427)(9) — (.427)(20) 


(193) (171) 


— (.259)(—11)] = 6.477 x 10~. 
And, finally, 
var(r — r’) = [6.893 + 23.450 — 6.477] X 10-* = 23.87 x 10-4. 


Form (b). The “z formula” utilizes all the (L—1) =19 values of dz and dz’ in 
columns 12-13. Two alternate procedures for obtaining these are described in 
detail in section 7b; briefly, they are derived from columns 8-11 as 


dz, = dy, rdx, and dz,’ = dy,’ r'dz,’. 
Or, alternately, from columns 2-7 as 


= 2 — 2941 = (Yo — ~- (Your — 1X41) 
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and 
dig’ = 29 — = (Yo) — — (Yous — p41’). 


After the (Z—1) pairs of quantities dz and dz’ are obtained (for columns 
12-13) by one of the two alternative computing procedures, they can be used 
directly to compute: 

) 1 L >> as! 1 20 
var(r) = —--- = 
xz? 2(L — 1) (193)? 2(19) 


+ (2.223)? + - + (.223)? + (1.813)?] = 6.893104 


[(2.482)? + (.036)? + (1.482)? 


1 L 1 
x”? 2(L — 1) 


de? = [(1.719)? + (1.135)? + --- 


+ (3.012)? + (.585)?] = 23.450 x 10-¢ 
Lb & 2 20 


2 
2 cov(r, r’) = —-—-—— dad?’ = 
re’ 2(L — 1) (193)(171) 2(19) 


+ (—.036)(—1.135) + + - - + (—1.813)(.585)] = 6.477 x 10-4, 


var(r’) = 


[2.482(—1.719) 


Finally, 
var(r — r’) = [6.893 + 23.450 — 6.477] x 10-* = 23.87 x 10-4 


9. MODEL i1l: NUMBER OF PS’S PER STRATUM CONSTANT AND GREATER THAN TWO 


This medei can he used whenever any constant number m, of P§’s is selected 
independently frora each stratum. In our example we form L=5 strata, and 
assume each consecutive four PS’s to be from one stratum. The variance formu- 
las are developed in section 14; they are based on PS deviations from their 
stratum means as: 


Dy = = (ve *) and Dz = Dra = *). 


These Dy and Dx wuits, together with the Dy’ and Dr’ for the other pair of 
variates, may be used io form (a) by computing the ten terms of var(r—r’), each 
a sum of Xm, creas-products (four of them being squares). Or one can con- 
struct the units for the “z formula”: 


Dz = == — rDzgx) and D2’ = Dag! = (Dygi’ — 


and use form (b) br’ summing the three cross-products (two of them squares) to 
the L Xm, terias, However for both procedures we shall present equivalent 
units which are easier to compute. 

Form (a). The “d formula” (14.11 and 14.14) leads to ten cross-products uti- 
lizing the Dy and Dz units or their equivalents derived directly from the original 
data in columns 2-5. These equivalents are (with h=1, 2, ---, m, and g=1, 
2, L): 


432 AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 1959 
L L L L L 1 L 
L L m™ 1 L 
DyDz = DY DY yortor — LD vers. (9.1) 


There are three similar terms involving the Dy”, Dz” and Dy’Dz’; then the 
cov(r, r’) terms: 
L 


L L 1 

yyy = YohYoh — — YoYo 
Dydy’ = — yaw’ | 
L L me 1 L 
L L me 12 
DyD2! = verter’ — — 


L L m™ 1 L 
Dy'Dz = Yor — 2 Yo Xo. 


Each of these units can be computed quite simply. The first one is from column 
2: 


L 
> Dy? = (5° 4%) 
— 4(11? + 6? + 11? + 11? + 11’) = 166 — 130 = 36.00. 
The last term is computed from columns 3 and 4 as 
L 
> Dy'Dz = (14 X83) 
— 3(444X 11+ ---+41 X 12) = 682 — 693 = — 11.00. 


After the ten terms have been computed the three terms of the variance can be 
estimated from them. Thus, 


1m, L L L 
var(r) = —- > Dy? +r? Dz? — 2r (9.3) 
xz? 


4 
(193)? 3 
= 9.898 X 10~. 


var(r) = [36.00 + (.259)2(78.25) — 2(.259) (26.25) 


Similarly, from the Dy’ and Dz’ units we get 


4 
a7)? 3 [184.25 + (.427)2(252.25) — 2(.427)(192.5) ] 


= 30.033 X 


var(r’) = 


7 
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The covariance term uses the Dy, Dx, Dy’ and Dz’ units: 


3m L L 
2 cov(r,r’) = —-- [ > DyDy’ + rr’ DxDz' 
zz’ m—1 


(9.4) 


L 
DyDz' > byDe | 


2 
2 =— — [(13.25 259) (.427) (18.75) — (.427) (14.75 
cov(r, r’) 3 ) + (.259) (.427) (18.75) — (.427) (14.75) 
— (.259)(—11.00)] = 9.595 x 
Finally, 
var(r — r’) = [9.898 + 30.033 — 9.595] x 10-* = 30.34 x 10-4. 


Form (b). The “z formula” (details derived in 14.12 and 14.14) leads to three 
cross-products. These can be computed either from the Dz and Dz’ units or, pref- 
erably, from their equivalents which are derived from the z and 2’ units in 
columns 6-7: 


Zh = Yoh — TX oh Zn’ = Yon — 
L m, 1 L L L me 1 L 


L L me 2 L 
Me 


From these quantities the variances are computed: 


1 
D2? = (la. .373)? + (—1.109)? + - 


+ (2.187)?] — 3[(—.400)? + - - - + (.378)2]} = 9.899 x 
var(r’) —2.988)? + (—1.269)? + - 
ax’? m, (171)? 3 
+ (.865)?] — 4[(—2.661)? + - - - + (—.381)?]} (9.6) 
= 30.030 10-4 
2cov(r,r’) = DzD2’ = [(1.373)(—2.988) + - 
zx’ m—1 (193)(171) 3 
+ (2.187)(.865)] — [(—.400)(—2.661) + - - - + (.378)(—.381)]} 
= 9.599 X 10-4. 


Finally, var(r—r’) = [9.899 +30.030 —9.599 | 10-4 =30.33 x 10-4. 
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10. SOME COMPUTATIONAL CONSIDERATIONS 


Let us call the formulas using the 10 “d” terms (used in the forms a of sections 
7-9) the “d formulas” and the formulas using the three “z” terms (used in the 
form b of sections 7-9) the “z formulas.” Each of these may be represented by a 
triangular matrix. 


The “d formulas” (10.1) The “z formulas” (10.2) 


+dy —rdz —dy’ +r’dz’ 
—2 |-2 +42 


var(r) — 2 cov(r, r’) 


—rdx 44 


+dz —dz' 


+ —2 
+dz | var(r) |—2 cov (r,r’) 


+ 
+r’dzx’ —dz' var(r’) 


In the “d formulas” (10.1) the four diagonal square ( ) terms represent the 
squared terms of variances, and the six other pairs of terms (one below and one 
above the diagonal) represent the product terms of covariances. The + and — 
signs indicate how these terms enter into the formulas of var(r), var(r’), and 
var(r—r’). The upper left section represents the three “d” terms that appear in 
var(r); and the upper right section represents the four “d” terms of cov(r, r’). 
Note that var(r) and var(r’) are obtained as. by-products in the computations 
of var(r—r’).! 

The “z formulas” have only three terms; but these require the preparatory 
computations of the units “z,” “y,” —r“zx”. However, this procedure has some 
advantages in simplicity. (1) The z terms are easy to check since their sum is 
zero, and all large values of z can, and should, be checked easily against the y’s 
and 2’s. (2) There are only three summed terms; two of squares and one of the 
cross-products. (3) These can be put into an automatic desk computer in the 
form of [dzX10*+ dz’]; these terms are then squared and cumulated. The ex- 
ponent k is some integer (like 7 or 8), large enough to separate the three result- 
ing terms but still leaving room for the sum of squares of the dz’s. Two separate 
cumulations are necessary: one for like signs (either + + or — —), and another 


1 The “d” matrix is actually the product matrix AA’; where A is the 4 XC matrix of the four d units for each of 
the C contrasts in the computations; and A’ is its transpose. Thus each of the 10 terms is the sum of C squares or 
products. For two PS’s per stratum (Model I), C=L =my/2, where L is the number of strata and mr is the total 
number of PS’s; for the systematic selection of Model II, C = L’ —1 =mp—1; for Model III, C =Lm, =mr. The “z” 
matrix (10.2) is the product matrix BB? where B is the 2 XC matrix of the two dz units for each of the C contrasts 
in the computations. 
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for opposite signs (either + — or — +). On the whole, we recommend them for 
computations on a small or moderate scale and with desk computers. 

The “d formulas” are practicable when larger computing machines are used, 
either a tabulator and a multiplying-punch, or a high-speed computer. These 
formulas have the advantage of making available the 10 basic variance and co- 
variance components; these may be needed for themselves or for methodolog- 
ical investigations. The “d formulas” have the disadvantages of greater com- 
plexity and more opportunities for clerical errors when desk computers are used. 
Two separate cumulations are necessary for the term [dy X10*+dz]?: one for 
the like (either ++ or ——) and one for opposite signs (either +— or — +); 
similarly, there are two cumulations for [dy’ x 10*+dz’}?. However, the term 
[dy X10*+dzx] [dy’x10*+dz’] would need eight separate cumulations: 


(t+++4), (£++F), (L£F+), (4 (Ft+4), 
(L+FF), (£F4F), (LFF); 


it is probably easier to obtain these four covariance terms separately with two 
cumulations, 


(++) and (+F) 


for each. From punch cards, tabulating machines can obtain the sum of prod- 
ucts of different signs; three cumulations will do for the “z formulas,” and ten 
cumulations for the “d formula.” One may start with separate “summary” cards 
for the PS’s; these will have as their basic units either the quantities y,, and 
Zor; or the quantities dy,, and dz,,; or the quantities z,,; or conceivably the 


quantities dz,,. Different circumstances and different resources will make for 
conditions when one or another method is preferable. However, we doubt that 
the final merging of the desired terms as 


z 2’ 


is worthwhile because it hides the three important components from separate 
perusal. 

Using PS “summary” cards and a tabulator and multiplying-punch, we suc- 
ceeded in reducing several fold the over-all costs of computing these variances. 
And programming the calculations for an electronic computer again reduced 
the costs several fold. The program (written by Charles E. Dean) has a great 
deal of flexibility in: 1) a large and variable number of PS contrasts; 2) the use 
of either paired PS’s (Model I), or the systematic formula (Model II), or a com- 
bination of these; 3) the use of the same or different bases (x) for different 
numerators (y). With it we can now compute for a reasonable cost the variances 
of hundreds of estimates that arise from a single survey. The program uses the 
“d formulas”; also it yields ratios to the simple random variances when the 
r’s are proportions. 


11. SOME NATIONAL AREA SAMPLES 


Most of the national samples of the Survey Research Center are based on 
interviews conducted in 66 Primary Sampling Areas (PSA’s) within each of 
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- which the Center maintains a trained interviewing staff. This section describes 
briefly the usual kind of sample in which dwellings are assigned, through several 
stages, uniform final probabilities of selection. It also illustrates the use of 
“paired PS’s” (Model I) for complicated but common situations. The PSA’s 
are counties or pairs of counties, or metropolitan areas of a few counties. One 
PSA is selected with probability proportional to size from each of 54 strata, 
most of which comprise roughly equal populations. The other 12 PSA’s are 
“self-representing”: the 12 largest metropolitan areas, each an entire stratum; 
together these 12 areas comprise about 30% of the United States population. 

Within the PSA’s the majority of dwellings are selected in the two stages of 
blocks or segments first, and then dwellings; most blocks or segments include 
2-5 sample dwellings; in “compact” segments all dwellings are included without 
subselection. For the minority of the sample a selection of city or town, or town- 
ship or Census enumeration district, intervenes between the PS and the block; 
the sample from a city or town may include 5 to 15 dwellings. Thus, the selec- 
tion of a dwelling may depend on the four-stage selection of PSA, city, block 
and dwelling. However, in many cases some stages are suppressed by selection 
with certainty; the extreme can be reached with a single-stage selection of a 
“compact segment” in a city selected with certainty in a “self-representing” 
PSA. This flexibility is maintained for the sake of economical and practical 
field procedures. Still, the relatively simple Model I can be used to provide a 
good approximation for the variance. 

These national surveys range in size from about 1000 to about 4000 dwellings. 
The average survey may include about 1000 blocks and segments in about 250 
cities, towns, townships or enumeration districts within the 66 PSA’s. Within 
the sample dwellings the final population element may be uniquely defined, 
e.g., the head of the household; in other cases, the respondents may be selected 
at random among the adult occupants; in still other cases, data are obtained 
about all population elements within the household. 

For any of these types of surveys the same basic computational formulas 
can be used to obtain var(r—r’) =var(r)+var(r’) —2 cov(r, r’). For the sim- 
plest computations with desk computers the model might be one with L=45 
strata with m=2 selections from each, and using the “z formula” procedure 
(Model Ib, formula 7.3) : 


13 1 2 
var(r — r’) = — dz,? + — > dz,'* — — dz,dz,’. 
av 


Actually, this model only approximates the selection in several ways which tend 
to exaggerate slightly the estimate of the variance: (a) the 54 PSA selections 
from as many strata, were “collapsed” pairwise (section 4b) to simulate 27 


2 We found it desirable to distinguish among the three kinds of units, PSU, PS and PSA, which often are called 
by the same name of PSU. The L strata are each divided into My PSU’s—primary sampling units; from these 
are selected m, PS's—primary selections. The PSA’s—primary sampling areas—are survey administrative areas. 
Now, each of the 54 “county” PSA's is also a single PS selected from the PSU's of its stratum. But the 12 metropoli- 
tan PSA’s are not PS’s nor PSU's; from them 14 city and 4 suburban strate were formed. In the former, blocks were 
used as PSU's and, in the latter, cities and towns and PS's were selected from these. It may be added that in sampling 
with replacement two or more PS's can come from the same PSU. 
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strata with two PS’s from each. (Moreover, the “controlled selection” of these 
PSA’s also had to be disregarded.) (b) In the 12 large “self-representing” 
PSA’s, the central cities were the basis for 14 strata (the five boroughs of New 
York City formed three strata). In each of these strata the blocks were sorted 
into two halves; thus, two “pseudo PS’s” (section 4c) were created following 
the systematic selection of PS’s. Blocks 1, 3, 5 - - - formed one PS, and blocks 
2,4, 6.-+-++, the other. (c) The selections of suburban cities and towns were 
sorted into pairs of “pseudo-PS’s” within four strata. Note that in all three 
situations the computations depend on differences among the sampling units 
that actually were selected first (the actual primary units) : counties in (a), blocks 
in (b), and cities in (c). Thus, “pseudo-PS’s” were created which are more simi- 
lar in sample size, and hence, more nearly equal in their contribut.ons to the 
variances. 

Gencrally, the ratio r=y/z is a simple sample mean; the z is simply the count 
of sample cases, for the entire sample or for the subclasses which represent the 
important “domains” in the analysis of the survey results. The difference (r—r’) 
represents the comparison of the means for two subclasses. Usually the com- 
puted variances are checked against and compared to the estimate of the vari- 
ance of a simple random sample (s.r.s.) of the same size (x). This is particularly 
easy to do when the mean r is a proportion— which is true for most of the sur- 
vey results. Then the simple random variance can be estimated by r(1—r)/z. 
For each computation of var(r) we generally compute the “design effect”: 
e=var(r)+r(1—r)/z; then we plot e against the sample size x. The “design 
effect” ratio for samples of size 1000 to 2000 generally ranges from 1 to 2. A 
ratio of 1 means that the sample has the same “efficiency” per element as a 
simple random sample. A ratio of 2 means that the same variance would have 
been obtained by one-half the number of elements selected by simple random 
sampling. In this case using s.r.s. computations would lead to using actual 
confidence intervals of length 2/+/2 s.e.=1.4 s.e. instead of 2 s.e.; and this 
would lead [6] to confidence probabilities of .164 instead of the desired and as- 
sumed .05. 

The “design effect” ratio is usually less for the smaller subclasses. It tends to 
be greater for subclasses for which residential segregation is greater. For sam- 
ples which include all adults in the household, the “design effect” may become 
very large; we have observed values of 2 to 6. 

The “design effect” of a comparison of two proportions (r—r’) can be esti- 
mated as 


var(r) + var(r’) — 2 cov(r, r’) 
rl —r)/r+ — r’)/2’ 


In most of our computations these ratios tend to be larger than 1 but smaller 
than 


var(r) + var(r’) 
r(l —r)/x+r'(1 — r’)/2’ 


That is, generally, the covariances have been positive, and their neglect would 
generally lead to overestimates of the variance. But the covariances are not 


23 
Ha 

4 
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usually large enough to cancel the “design effect’’; hence the use of s.r.s. for- 
mulas would lead to underestimates of the variance. 


12. SYSTEMATIC SAMPLE OF A CITY 


This illustration is based on the Detroit Area Study of 1955. The sample of 
PS’s consisted of 82 tracts selected systematically and with probabilities pro- 
portional to 1950 population from a stratified list of Detroit Census tracts. 
From each sample tract, three blocks were selected in the second stage. In 
these blocks, the dwellings were listed and a subsample of dwellings selected in 
the third stage. There were about 12 dwellings selected on the average per tract, 
with all dwellings in the city having an equal probability of selection through 
the three stages. Altogether 958 adults were interviewed from that many dwell- 
ings; however, some of the information related to all adults in the household 
and there were 2061 of these in the entire sample. 

The model used for the computation of the variances was that of a system- 
atic sample (Model II in section 8); there were 81 “looped” differences formed 
from the L =82 PS’s. With machine computations the ten sums of cross-products 
were obtained and used in the formula (8.1—8.2): 


1 s2r 
var(r — r’)? = —--——| dy? + — 2r 
z? 162L 
1 
+ —-— | +r"? de? —2r > 
162 L 
2 s2r = 
> dydy’ + rr’ dxdx' — r’ dydzx’ — Dayar]. 
4 


The ratios of the variances of r in this case generally showed little “design 
effect,” ranging mostly from 1 to 1.5. This is due to the rather small sizes of 
clusters used, and to some effective stratification of the PSU’s (the tracts). The 
larger ratios tend to occur for estimates based on all adults, and for estimates 
based on segregated social or national subclasses. 


13. SURVEYS OF CONSUMER FINANCES 


The main features of the Survey Research Center’s national sample of dwell- 
ings are described in section 11; but for these Surveys there are generally 
three basic departures: (1) Dwellings are sampled with different selection rates 
according to a double sampling device, based partly on Census information 
and mostly on the economic rating of the interviewers. This is done to achieve 
greater efficiency per interview on some financial questions and for the upper 
income groups which comprise a relatively small but important “domain” of 
the Surveys. Each punch card is weighted to compensate for the differences in 
the sampling rates; by means of these weights the estimates y,, and z,, are pro- 
duced (see 3.3b). (2) For convenience in computing percentages the weights are 
adjusted so that z, the total sample base of “spending units,” is made equal 
to 100,000. This illustrates the flexibility available in assigning values to F, 
to the individual elements and to the PS totals. (3) The survey element is the 
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“spending unit,” defined as those members of a family unit who pool their in- 
come for their major items of expense. About 12% of family units and 16% of 
the dwellings contained a “secondary spending unit.” (4) In computing the 
variances we use two “stratum groups” (4d and 14.16). The first stratum group 
consisting of 18 pairs of “pseudo-PS’s,” contains the 12 “self-representing” 
PSA’s, receiving the same treatment as groups (b) and (c) in section 11. How- 
ever, the 54 other PSA’s are not paired into 27 “pseudo-PS’s” as in (a) in section 
11; rather they are ordered according to stratification and the variance is com- 
puted using the L—1=53 “linked” differences of a systematic selection (4b and 
4c). Our aim is a more precise estimate of the variance. The machine computa- 
tions use the ten components for both “stratum groups”; thus (6.2): 


var(r — 7’) = “ [var(y) + r? var(x) — 2r cov(y, x) | 
+ [var(y’) + r’? var(x’) — 2r’ cov(y’, x’) | 


2 
— — [eov(y,y’) + rr’ cov(z, 2’) — r’ cov(y, 2’) — r cov(y’,z)]. 


The variances and covariances are computed according to 14.16; and letting g 
represent the paired differences in 7.1—7.2, and f represent the “linked” differ- 
ences in 8.1-8.2, we have (for three terms serving as examples for all ten terms) : 


18 54 53 18 54 53 
var(y) = dy,? + dy? — cov(r2’) = + LX 


18 54 53 
cov(y, x’) = dydz,’ + 


with similar computations for the other seven terms. In each year about 3000 
interviews are taken. The “design effects” are generally similar to those de- 
scribed in section 11. The ratios generally run from 1 to 2.5 for the entire sam- 
ple and less for the smaller subclasses. The variance of comparison is usually 
reduced by the covariance term, but not to the extent of eliminating the 
“design effect.” 


14. DEVELOPMENT OF VAR(r) AND VAR(r—r’) 


If r=y/x where y and z are any random variables, then an approximate 
estimator of the variance is given by [1, pp. 114-22; 4, Vol. II, pp. 107 ff; 7, pp. 
146 ff]: 


var(r) = [var(y) + r? var(x) — 2r cov(y, x) ]. (14.1) 


It can be shown that if r’=y’/z’ a similar approximation for the covariance of 
rand r’ is 


1 
cov(r,r’) = —- [cov(y, y’) + rr’ cov(z, x’) — r’ cov(y, x’) — rcov(y’, x) ]. (14.2) 
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Since 
var(r — r’) = var(r) + var(r’) — 2 cov(r, 7’), 


from the above we get 


var(r — 1’) = 4 [var(y) + r? var(x) — 2r cov(y, x)] 
+ “ [var(y’) + r’? var(x’) — 2r’ cov(y’, 2’) | (14.3) 


2 
[cov(y, y’) + rr’ cov(z, x’) — r’ cov(y, 2’) — rcov(y’, 


This may be expressed briefly as 


1 2 
var(r — r’) = var(%) + var(2’) — cov(é, 2’). 
x 


The variable “z” is defined in terms of the computational units 2, = 
and in similar units in the “z formulas” of the variance and covariance terms. 
But 2 itself remains undefined; specifically, it is not equal to (y—rz) which is 
identically zero. For the special cases when the two ratios are based on the same 
xz=x’, the variance of [(y—y’)/z] can be computed either from 14.3 or from 


1 ")2 
var(4 “) | +y')+ var(zx) 
x x? 


_ 2yty’) 


cov(y + y’, 2] (14.4) 


12 
{vary + var(y’) + 2 cov(y, y’) + 


Assume that (1) the population has been divided into L strata and the statistics 


L L 
Fu=F Du, and Fu=F Dy, 
are used to estimate the population values 
L L 
U=)>U, and V=)Y¥,, 
respectively. F is some constant “inflation factor,” and the “wu” and “v” stand 
for any of the variates “y,” “x,” “y’,” “x’,” “2” and “z’,” as needed. The usual 


estimators of the variance of u and the covariance of u with v are 


var(u) = > var(u,) and cov(u, v) = > COV (Ug, Ve) (14.5) 
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Assume further that (2) m, independent selections have been made from the 
g-th stratum, with m,>2; (3) estimates u,, are prepared for the h-th selection 


me 

in the g-th stratum such that = U,; (4) the statistic Fu, =F > usa is 
the desired estimator of the stratum population value U,. Then the usual esti- 
mators of the variance of u, and the covariance of u, with v, are 


mg — Mg, 


h 9 
For these basic units we define the briefer notation: 
Ue \? u v 
Du,? = (ue - and Du,Dv, = > (ue (va - (14.7) 
A 0 Mg 
Combining the results of 14.1, 14.5, 14.6 and 14.7, we obtain 


Mg 


1 L 
var(r) = [Dy,? + — 2rDy,Dz, |. (14.8) 
g 


m,—1 


Similarly, by using 14.2 and 14.5, 14.6 and 14.7, we obtain 


[Dy,Dyy’ + rr’Dx,Dzx,' — r'Dy,Dx/ 


1 L 
cov(r, r’) = — 


Mg 
m,—1 
— rDy,'Dx,| (14.8) 
In many situations it is more convenient to use the computational forms based 
on 2 units defined as: 


m 
Zgh = Yoh and 2 = = Yq — 


For these units we establish the relationships: 


= Dy,? + r?Dz,? — 2rDy,Dx,. 


Similarly, it can be shown also that 


mg 


Dz,Dz,' = >> 


h 


= Dy,Dy,' + rr’Dx,Dzx,' — — rDy,’Da,._ (14-9a) 


| 
and 
Dz2,? = 
(14.9) 
- - =) 
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As an alternative to 14.8 we obtain the ncn formulas 


. (14.10) 


var) = Dest and cov(r, r’) = - 


By combining 14.1 and 14.8, we get one of the two alternative computational 
forms, term by i. for oes =var(r)+var(r’) —2 cov(r, r’), that is, 


var(r — r’) = — ; Pu? + rDz,? — 2rDy,Dz2,| 
1 L 
+— 2 [Dy,’* + r’*Dzx,'* — 2r’Dy,'Dz,’ | (14.11) 
“| 
2 


[Dy,Dy,’ + rr’Dx,Dzx,' — r’Dy,Dzx,' — rDy,'Dz,]. 


By siiahel the results of 14.9 to those of 14.11 we get the other computational 
form in terms of the 2’s: 


mm 
var(r —r’) = +—> Dz,’? 
(14.12) 
-— Dz,Dz,’. 


If one were not interested in the three terms separately, he could compute only 


var(r — r’) = > = - (14.13) 


m, —1 


The formulas 14.10—-14.13 are general forms which could be used on most oc- 
casions. However, from them more convenient forms are developed for special 
situations. (1) When the number of PS’s per stratum is a constant m.>2, we 
obtain Model III of section 9. This differs only in replacing, in the covariances 
and variances (u=v), 


Du,Dv, = Ugh oh — (14.14) 
h 


Mg 


b 
y L L me 1 L 
Du,Dv, = Ughgh — — > 
Me g 


(2) When m,=2 for all strata we obtain Model I of section 7 by noting that, 
forh=a,b 


mM, me Ug 2 
h 


(Uga + Ugs) + 
2 


] = (Uga — Ugd)(Yga — Vs) (14.15) 


2[ + — Ugal gs — Ugd ga 
2 


= du,dv, 
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which becomes du,? when u=v. 

(3) Model II of section 8 is obtained when the systematic sampling formula 
is used ; this is derived in section 15. 

(4) Sometimes, the population is divided into a few, perhaps two or three, 
“stratum groups” within each of which the selection method and the number of 
PS’s per stratum is constant. Then the variance and covariance terms can be 
obtained, according to 14.5, by simple summations of their components within 


the “stratum groups”: 
var(u) = [var(w) + var(u)] (14.16) 
cov(u, v) = [cov(u, v1) + - - - + cov(us, J. 


The stratum group components are obtained according to the proper choice of 
computations from sections 7-9. After the components are summed into groups 
they are used in either 14.1 or 14.2. This simple method is illustrated in section 
13 for k=2. 
The bias in the estimate r of R can be approximated following Cochran’s 
[1, pp. 117-8] method, and remembering that E(x) =X/F: 
X/F 


X/F X/F 

— X/ X/F ’ 
X/F 

neglecting the higher terms of the Taylor approximation. 


Thus 


x/P(1 + 


E(y — Rx) = 0. 
Ey(z — X/F) = E(y — Y/F)(x — X/F) = Covty, =) 


Ex(a — X/F) = E(x — X/F)* = Var(z). 


E(r — R) = x mane [R Var(z) — Cov(y, z)]. 


And this can be estimated using sample estimates as 


var(z) cov(y, 


(14.17) 
yz 


[r var(x) — cov(y, z)] = 


Now 
But 
| and 
Hence 
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15. DERIVATION OF VARIANCE APPROXIMATION 
FOR A SYSTEMATIC SAMPLET 


Let there be L PS’s selected systematically from L “implicit strata.” In each 
PS a subsample is selected and an estimator y, of the stratum total Y, is ob- 
tained such that E(Fy,)=Y,, where F is the constant “inflation factor.” The 
sample total is 


L 
and E(Fy) = Y. 


Let us denote the variance among all possible subsamples within the g-th 
stratum as 


L 
Var(y,) = E(y, — Y,)? where Y,=Y,/F and Var(y) = > Var(y,). 


Let us denote by y, and y,4; the subsamples from two “neighboring” selections, 
which are assumed in the variance computations to have come from the same 
“collapsed” stratum. 


L-1 L-1 
E (Yo — You)? = E > [(Yo — Vy) — (Your — Yous) + — }? 


L-1 
=E [(Yo — Y,)? + (Your — You)? + (Y, — Y41)?] 
plus three covariance terms. Two of these are zero. The third 
L-1 
E (Yo Yo) (You Yo41) 


is zero on the assumption that the sampling deviations in neighboring strata 
are not correlated. This is true insofar as the ordering of PSU’s within the 
strata are not correlated—a precaution to be observed when using systematic 
selection [1, pp. 164-7]. With this assumption 


L-1 L-1 L-1 L-1 
ED (Yo — You)? = Var(ye) + Var(yors) + Do (Vo — Voss)? 


L-1 
= [Var(y) — Var(yz)] + [Var(y) — Var(y:)] + 30 (Vo — Yous)? 
1) 


where y; and y, stand for the “first” and the “Lth” (“last”) subsamples, re- 
spectively. Now we multiply through by L/2(Z—1) and transpose to get 


L-1 
Var(y) + Var(y) — - Var(y:) | + — Pon)? 


L-1 
Var(y) = (we 


t We are grateful to Benjamin J. Tepping for a derivation on which some of these results are based. 


| 
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L Vv 
- ar(y) ~ Var(vs) | 
L 


In order to obtain var(r) of 15.3 we first obtain Var(r)=E(r-—-R)?, where 
r=y/zx is an estimator of 


X E(Fz) 
We need [4, Vol. II, pp. ar 
Var(r) = [Var(y) + R? Var(x) — 2R Cov(y, z)]. 


P 


For Var(x) and Cov(y, x) expressions comparable to that of 15.0 for Var(y) may 
be obtained; hence 


1 L 
ah 2 2 _ 2 


L-1 
~ (ve — — (15.1) 


and the following two expressions which we choose to neglect: 
1 L 
(X/F)? = 1) 


Var(y — Rr) — Var(y; — Ray) 
(15.2a) 
— Var(yz — | 


and 


— 
~ (X/F) D ¥ (Cr, RX,) — (Pass (15.26) 


The first of these two expressions (15.2a) tends to disappear insofar as the vari- 
ances in the first and last strata together are of average magnitude. The second 
expression (15.2b) will not generally disappear, and its necessary neglect results 
in an overestimate of the actual variance by an amount equivalent to the neg- 
lected expression. Usually, this effect will be slight since it depends on the 
difference between two parameters in successive strata; and these parameters 
in turn are deviations from average expectations. This expression (15.2b) is 
essentially the same kind as exists in any systematic procedure; in fact, in any 
procedure involving “collapsed strata” in the variance computation, when a 
single PS is selected from each stratum. 

The parameter E(r—R)? is usually estimated by var(r) where we estimate 
X/F by x and R by r; thus 
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i L-1 
(15.3) 
— 2r (Yo — — teu) |. 
With similar methods we also obtain 


1 


L—-1 
+ rr’ (2% — — Toss’) — (Yo — Yors)(e’ — 


L-1 
— — — teu) 
From formulas 15.3 and from var(r—r’) = var(r) +var(r’) —2 cov(r, r’) we may 


construct the estimators of var(r) and var(r—r’) presented in section 8. The 
two terms neglected in the approximation for E[(r—r’) —(R—R’) }? are: 


L 
Lis — You) — R(X, — 


and 


2 
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ACCURACY REQUIREMENTS FOR ACCEPTANCE 
TESTING OF COMPLEX SYSTEMS* 


C. R. Gates anp J. P. Fearey 
California Institute of Technology 


A procedure for acceptance testing the constituent components of a 
complex system is considered. In this procedure, the quantity of interest 
is the total system performance, and the individual components are 
tested by a gauge which determines whether some pertinent character- 
istic lies within two limits. An imperfect gauge, in which the two limits 
are themselves random variables, is assumed. A relationship is deter- 
mined between (1) the quality of the incoming components, (2) the 
characteristics of the gauge, as specified by the accuracy of, and the 
aperture between, the limits, and (3) the quality of the tested system. 
The results show that the outgoing quality is sensitive to the aperture 
of the gauge and insensitive to the accuracy of the limits. Numerical 
results are presented, and specific examples are discussed. 


1. INTRODUCTION 


PROBLEM encountered frequently in acceptance-testing the components 
A of complex systems is that of determining the accuracy required of the 
testing instruments on which the acceptance-rejection decisions are based. 
A common method of testing is to determine whether some pertinent character- 
istic of the component lies between two limits. This “gauging problem” has 
been treated by Tippett [1] and Stevens [2] for an errorless gauge. However, 
the designer of the test instrument should know the tolerable error in the gauge 
limits as well as the interval between these limits. Suppose, for example, the 
characteristic of interest is a voltage V which is to be compared with two 
“standard” voltages Vi and V2 to determine whether Vi< V< Vz. It will be 
necessary to specify not only the nominal values of V; and V2 but also the per- 
missible error. For this example, the errors in the two gauge limits are, in gen- 
eral, independent, and these errors will be assumed to be independent through- 
out the paper. A much-used rule of thumb states that the instrument inaccu- 
racy should not exceed one-tenth of the dispersion permitted in the quantity 
being controlled. In this paper, an attempt is made to provide a mathematically 
sound basis for such rules. 

Consider a system of N components, in which the pertinent characteristic 
of the i-th component is measured by a quantity z;. The error attributable to 
the i-th component is assumed to be the deviation of x; from its standard value 
Zie; it is assumed that the over-all system error D is the sum of the individual 
component errors, or 


D= — (1) 


* This paper presents the results of one phase of research carried out at the Jet Propulsion Laboratory, Cali- 
fornia Institute of Technology, under Contract No. DA-04-495-Ord 18, sponsored by the Department of the Army, 
Ordnance Corps. 
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A complex system typical of those under consideration would be a radar set, 
in which D is the error in a measurement of, for example, range to a target, and 
in which the i-th component contributes an (additive) error x;—2is. 


2. FORMULATION 


It is assumed that the distribution of x; is normal, with a mean 2;, (hereafter 
taken, without loss of generality, to be zero) and a standard deviation o;. Then 
if E(x;x;) =0, 


oy? = E(D*) = Dae? 


Further, it is assumed that the permissible error allotted a priori to each com- 
ponent is defined by o;=1. (It should be noted that the allocation of equal 
permissible error to each component is not, in general, economically optimum; 
however, in a specific application, the actual allocation of permissible error 
could easily be used.) Thus, oy?>N implies unacceptable quality, and oy?<N 
implies acceptable quality. 

Now, as a result of the test procedure, the incoming distributions will be 
altered, and new standard deviations ¢;’ will result. A satisfactory test proce- 
dure would yield 


(oy')? = (oi)? SN (3) 


In a given complete test, each of the N distinct components will be subjected 
to the gauging process. If the i-th component fails the test, subsequent tests of 
the i-th type of component will be made, and the first component of the 7-th 
type to pass the test will be a part of the outgoing system. In the analysis to 
follow, it is assumed that a fraction p of the incoming component types are 
“good,” i.e., come from distributions having a standard deviation ¢,=1 and 
that a fraction q are “bad,” i.e., come from distributions having a standard de- 
viation ¢,>1, where 


p+q=1 (4) 
A test procedure is desired in which 


(on’)? = pN(o,')? + gN(o,')? SN 


(or!)? = (5) 


where ¢7’ is the normalized standard deviation of the distribution of the sum 
of x; after testing. In other words, cr’ is a measure of the total dispersion of 
the system after testing. Thus, the parameter N is eliminated. 

It now remains to define the test procedure, and to determine a,’ and a,’ in 
terms of this test procedure. It is assumed that 2; is compared with two limits 
having centers at «= +t/2, where ¢ is the total acceptance region (or aperture) 
of the test. If z; appears to be within these limits, the component will be ac- 


TESTING OF COMPLEX SYSTEMS 449 


€(-t/2, a) xj(O, n(t/2, d) 


-t/2 | +t/2 


Fia. 449. Typical distributions of z; and test limits. 


cepted. However, it is also assumed that these limits are themselves independ- 
ent, normally distributed random variables, with standard deviations of \, 
where \ corresponds to the inaccuracy of the tester. This situation, which is 
illustrated in Fig. 449, corresponds to choosing random variables from three in- 
dependent populations: &(—t/2; A), x:(0; o;), and n(t/2; A). If and 
then the 7-th component is accepted; otherwise, it is rejected. If \—0, the test 
procedure truncates the distribution of x; at +¢/2; otherwise, a partial or grad- 
ual truncation occurs. 

In order to obtain f;’(x), the frequency function of 2; after testing, first define 


= exp — 2/20,2 | 
o(x) = $(z; 1) 
= f o;)dx 


= 1) 


be (6) 


Now, the probability that x; lies between x and x+Az and will be accepted is 
given by 


Prob(z S x; S x + Az) Prob(é < 2,) Prob(z; < 7»). 
But this expression is equivalent to 
o(x; + t/2)/A] {1 — — Az. (7) 
For convenience, let 


Q(x; t, A) = &[(x + t/2)/A] {1 — — t/2)/r]}. (8) 
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fi (x) = O(x; 01) Q(z; t, (9) 


is the required normalizing factor to compensate for the truncation. Since 


(oy? = f (11) 


and (¢,’)? and (¢,’)? can be obtained by interchanging subscripts, we have 
(or’*) directly from Equations (3) and (5). 

The average fraction accepted of the i-th component is P;; the average total 
fraction accepted F, is given by 


F, = pP, + qP,. (12) 
The average total fraction rejected Fz is given by 
Fr=1-—F, (13) 


For clarity, the complete expressions for F4 and (¢7’)? are given below: 


F,= pf op) Q(x; t, A)dx (14) 


+af (x; 0) t, 


= = + (15) 
f $(x; op) Qdx f $(x; oq) 


Equations (14) and (15) have been programmed on a digital computer, using 
various values of p, ap, q, 7, \, and ¢; the results are presented in the following 
section. 


3. DISCUSSION OF RESULTS 


In Figs. 451 to 453, (cr’)? is plotted against the aperture ¢ for a wide range of 
the controlling parameters. Each figure corresponds to a certain value of \, 
the dispersion of the test instrument, and there are curves for g, the fraction of 
the incoming equipment which is defective, equal to 0, 0.1, 0.25, 0.5, 0.75, and 
0.9 and for a, the dispersion of the defective equipment, equal to 2, 5, and 10. 

As one would expect, poor incoming quality, characterized by large gq or ag, 
tends to result in poor outgoing quality. Also, as the aperture ¢ becomes larger, 
(or’)? goes up, and increasing 4, the meter dispersion, results in larger (c7’)?, 
although the dependence on ) is much weaker. 

The next set of curves, Figs. 454 to 456, presents Fr for the same range and 


450 
Then, 
where 
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1) 
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4 
APERTURE 


Fig. 451. Variance of distribution after test (or’)? vs test aperture ¢ (A =0.1). 


values of the controlling parameters. As expected, the weaker test, character- 
ized by large \ and ¢, rejects a smaller fraction. 

Figure 457 presents loci of (A, ) values which, for some incoming quality, as- 
sure that the outgoing quality is barely satisfactory. In other words, the equa- 
tion 


(or’)*(A; t; g, = 1 


is plotted, treating q and o, as parameters and ) and ¢ as variables. Thus, for 
example, if the incoming quality is characterized by g=0.25 and o,=5, then 
(A, t) pairs such as (A=0.3, t=4.3), (A=0.5, t=4.2), ete., assure that the out- 
going quality will be satisfactory, but no better than satisfactory: i.e., that 
(or’)?=1. Of course, (A, ¢) pairs above the curves imply that (cr’)?>1, and 
vice versa. 

It is especially significant to note, in Fig. 457, the relatively weak dependence 
on \ and the relatively strong dependence on t. Thus, if \ is changed from 0.1 to 
0.5, which corresponds to a change in the inaccuracy of the testing meter of 
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Fia. 452. Variance of distribution after test (or’)? vs test aperture ¢(A =0.5). 


from one-tenth of the expected component dispersion to one-half of the ex- 
pected component dispersion, the required values of ¢ change but little, or, in 
some cases, not at all; on the other hand, a change in the aperture ¢ from 3 to 4, 
for example, sharply, influences the required \ values. 

Figure 458 shows Fp vs X for the ¢ values taken from Fig. 457. In other 
words, in Fig. 458, ¢ has been chosen for each point on each curve so that 
(or’)?=1. Again, it is seen that the dependence on d is not strong, particularly 
in the region 0.1<A<0.5. 

From Figs. 457 and 458, we can choose specific test procedures, characterized 
by a (A, é) pair. Of course, a priori information on expected incoming quality 
would be helpful in choosing a test. However, values of \ near 0.5 and values of 
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Fia. 453. Variance of distribution after test (er’)? vs test aperture = 1.0). 


t between 3.5 and 5 seem desirable. As an illustration, two (A, ¢) pairs which may 
prove acceptable in practice have been chosen: (A=0.5, t=4) and (A=0.7, 
t=3.5). For these values, (or’)? and Fr have been plotted in Figs. 459 and 461 
and in Figs. 460 and 462a, respectively. That (o7’)? and Fz are linear functions 
of q can be seen from Equations (14) and (15). The two test precedures assure - 
that (or’)?<1 for g values up to 0.3 or 0.4. The fractions rejected are unfor- 
tunately large, even for g=0. 
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Fie. 455. Fraction rejected Fz vs test aperture t(\ =0.5). 
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457. Loci of (A, ¢) pairs for (or’)? =1. 


4. CONCLUSIONS 


A model has been constructed, in which the individual components of a com- 
plex system are tested in a go-no-go device containing errors in the test limits, 
and in which the measure of outgoing quality is the sum of the individual com- 
ponent errors. For this model, it is shown that the aperture ¢ between the test 
limits is a sensitive parameter and should be on the order of 4 or 5 times the 
expected component error. Values of \ on the order of one-half of the expected 
component error appear best. 

It is of interest to note that, if data were available regarding the distribution 
of incoming quality, the cost of components, risks associated with poor outgoing 
quality, the cost of test meters as a function of their accuracy, etc., the tech- 
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Fic. 458. Fraction rejected Fz vs test-instrument dispersion \ for the 
values of ¢ yielding (or’)? =1. 


nique used in this paper could be extended to calculate cost and risk functions, 
so as to provide an analytic method for choosing a test procedure. 


5. APPENDIX 


By a slight modification of the mathematics given in the discussion of formu- 
lation, a closely related problem can be solved; for clarity and general interest, 
this problem is described here. 

Figure 462b presents a schematic representation of the problera previously 
considered, contrasted with the problem discussed in this Appendix. In the 
problem treated previously, Fig. 462b, case (a), two distributions (see Equation 
9) are tested separately, and a sufficient number of samples are tested to pre- 
serve the fractions p and q of the original component types; after testing, the 
two populations are combined to form the outgoing product. In Fig. 462b, case 
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Fria. 459. Variance of distribution after test vs defective fraction 
of incoming equipment =0.5, t =4). 


(b), the related problem is illustrated, in which the populations, in proportions 
p and q, are mixed before testing. If we define 


= f Qdx 


then we have, from Equations (5), (10), and (14), 
[or ]? = (or’)? = p5,?/Py + (A-2) 


However, for case (b), since the frequency function of the population delivered 
to the tester is given by 


Sr(x) = po(x; +  (A-8) 
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Fia. 460. Fraction rejected Fz vs defective fraction 
of incoming equipment g(A =0.5, t=4). 


then 
+ qs,” 
= (A-4) 
Fa 
where F’, is defined by Equation (12). The fractions defective Fz for each case 
are the same. 

Case (b) corresponds to a situation in which components such as, for example, 
electronic parts, from two presumed identical, but in fact different, distribu- 
tions, are combined before submission to the type of test device described ear- 
lier. 

In analogy with Fig. 457, Fig. 463 gives a plot of the equation 


[orqy’(A; t; g, 4) = 1 (A-5) 


which, for various values of g and o,, describes loci of (A, ¢) pairs which yield ac- 
ceptable outgoing quality. A general similarity between Fig. 457 and Fig. 463 is 
evident. 
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Fria. 461. Variance of distribution after test vs defective fraction 
of incoming equipment g(\ =0.7, t =3.5). 


NOMENCLATURE 


D=over-all system error. 
E=expectation. 
F,4=total fraction accepted, or probability of acceptance. 
Fr=total fraction rejected. 
N =number of components which contribute error to a complex system. 
P;=fraction accepted of the i-th component (i.e., the normalizing factor 
which compensates for the truncation). 
p=that fraction of the incoming components which comes from dis- 
tributions having a standard deviation of ¢, (i.e., the fraction ac- 
ceptable). 
q=that fraction of the incoming components which comes from dis- 
tributions havingastandard deviation of o,(i.e.,thefraction defective). 
t=total acceptance region (or aperture) of the test. 
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Fia. 462a. Fraction rejected Fz vs defective fraction of 
incoming equipment g(A =0.7, ¢ =3.5). 
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x;= quantity which measures the pertinent characteristic of a component 
in a complex system. 
x;,=standard value of 2;. 
5=quantity defined by Equation (A-1). 
n, £=random variables. 
\=standard deviation of the go-no-go limits (i.e., \ measures the in- 
accuracy of the tester). 
o;=standard deviation of the distribution of x; before testing. 
o;’=standard deviation of the distribution of x; after testing. 
o,=standard deviation associated with a distribution of incoming com- 
ponents which is acceptable (i.e., ¢,=1). 
o,=standard deviation associated with a distribution of incoming com- 
ponents which is not acceptable (i.e., ¢,>1). 
oy’ =standard deviation of the distribution of the sum of the distributions 
of x; after testing, unnormalized. 
or’ =standard deviation of the distribution of the sum of the distributions 
of x; after testing, normalized (i.e., cr’ is a measure of the total dis- 
persion of the system after testing). 
¢;) =normal distribution function. 
¢(x; o;) =normal frequency function. 
Q=quantity defined by Equation (8). 
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CORRELATION BETWEEN SAMPLE MEANS 
AND SAMPLE RANGES 


BERNARD OsTLE AND GEorRGE P. STEcK 
Sandia Corporation 


The correlation between the mean and the range is investigated for 
sampling from symmetric and asymmetric populations. It is shown that 
when the sampling is from any symmetric population with finite vari- 
ance the correlation is zero. However, zero correlation does not imply 
symmetry. The results are of interest in the field of statistical quality 
control, 


1, INTRODUCTION 


N THE application of statistical quality control techniques, one valuable re- 
I sult is the interpretation of patterns of variation exhibited by control charts. 
Some interpretations that have been encountered are: 

(a) When a series of samples are plotted on control charts and the X and R 
points do not tend to follow each other, either directly or inversely, the 
samples are probably from a normal (or at least symmetric) population. 

(b) If X¥ and R are positively correlated, that is, if X and R tend to rise and 
fall in unison, the samples are probably from a positively skewed popula- 
tion. 

(c) If X¥ and R are negatively correlated, that is, if X and R tend to move 
inversely to one another, the samples are probably from a negatively 
skewed population. 

The preceding possible interpretations lead to one consider the following ques- 
tions: 
(i) If random samples are taken from a symmetric population, are the sam- 
ple mean and sample range uncorrelated? 

(ii) Is the converse to question (i) true? 

(iii) If random samples are taken from a positively (negatively) skewed 
population, are the sample mean and sample range positively (nega- 
tively) correlated? 

In the following sections it will be shown that the answer to question (i) is 
Yes; to question (ii), No; and to question (iii), No. Since the examples which 
provide the answers to questions (ii) and (iii) are discrete and one is clearly not 
unimodal in any sense, it is natural to pose an additional question: 

(iv) If conditions of continuity and/or unimodality of the population sam- 
pled are imposed, do the answers to questions (ii) and (iii) remain “No?” 

The answer to question (iv) is not known; investigation of this question is in 
progress. 


2. PREVIOUS WORK 


Daly [1] proved that for a random sample from a normal population the 
mean and range are independently distributed. This, of course, says that in the 
normal case X and R are uncorrelated. The converse to Daly’s theorem may 
be stated as follows: The independence of the mean and the range of a sample 
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of n independently and identically distributed random variables implies nor- 
mality of the parent population. A proof of this theorem, subject to some as- 
sumptions concerning the moments of the parent distribution, has been given 
Lukacs and King [3]. However, a partial converse to Daly’s theorem, namely, 
that zero covariance between the mean and ‘range implies normality is not 
true as shown by a counter-example given later in this paper. Gumbel and Carl- 
son [2] studied the asymptotic covariance between X and s when sampling 
from a population for which all moments are finite. Their result is that cov 
(X, s)Sus/2ne, n large, where yu; is the third central moment. 


3. SYMMETRIC POPULATIONS 


In this section, the following theorem is proved: 
If a random variable is symmetrically distributed with finite variance, 
then the mean and range of a random sample are uncorrelated. 
Let the random variable X be distributed according to the distribution func- 
tion F(x). With no loss of generality, it may be assumed that n= F(X) =0. The 
assumption of symmetry is then expressed by F(—z)=1—F(zx) except at the 
points of discontinuity of F which are at most denumerable. Let (Xi, X2, - - -, 
X,,) be a random sample of size n in which X;, is the first observation, X_ is the 
second observation, - - - , X, is the last observation. Further, denote the small- 
est sample value by U, the largest sample value by V, and define the sample 
range by R= V—U. 
The covariance between X and R is 


oxen = E\[X — E(X)]-[R — E(R)]} = E(XR). (1) 
This may be expressed as 


E(XR) = E > X;R/n 


= E{X,(V — U)} (2) 
= E(X,V) — 
= E{X,E(V| E{X,E(U| Xy}. 

To evaluate these conditional expectations, it is necessary to know the condi- 


tional distribution of V given X,=¢ and the conditional distribution of U given 
X,=t. These are found to be: 


0 y<t 
ye=t (3a) 
>t 


and 


<t 
P{U<y|Xi=t} =41 y=t 
1 y>t. 


(3b) 


| 
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Therefore, 


E(V |X: = 8) = + (n 1) f y[F(y) (y) 


and 
t 
E(U| nf y[1 — F(y)|""*dF(y). (4b) 
If the integrals in (4) are integrated by parts, it is possible to show that 


EV| X=) +f, a) 
and 


Consequently, since the first integrals in (5a) and (5b) are constants, and since 
E(X,) =0 by assumption, 


B{XBV|X)} = f aro f, (6a) 


E E U xX, = tdF 1 F Iq 
Hence, 


(7) 


f f { [F(wt) — [1 — dw. 


Now, if F corresponds to a symmetric distribution, then F(wt) =1—F(—wt) 
except for a denumerable set {w;} such that w are the points of discontinuity 
of F. Hence, the integral in the brace in (7) is an odd function of t. Therefore, 
E(XR) =0 and it has been shown that symmetry implies E(XR) =0. 


4. SKEWED POPULATIONS 


It has been shown above that symmetry is sufficient to imply E(XR) =0. 
It is natural to inquire whether this condition is also necessary. However, sym- 
metry is not necessary for zero correlation as is shown by the following example. 

Let X be a random variable which assumes the values —4, —2, —1, 0, 1, 2,4 
with probabilities ps, , pz, respectively. If the conditions 
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(i) pi=1, (ii) E(X)=0, (iii) H(X*)=1, and (iv) E(X*)=S are imposed, 
t=1 


then the distribution of X can be characterized by four parameters, say ps3, 74, 
ps, and S. 

Consider a random sample (X;, X:) from the population specified by the dis- 
tribution given in the preceding paragraph. 

In this case 


E(XR) = E{(X: + X2)| — X;| }/2 (8) 
and we shall denote this expected value by &. | 
Defining the quantities A, B, C as 
A=1-p—-M— Ds 
B= ps — Ds 
Ps, 


it can be shown that 
24 = 9S + 45B — 2SC — 18BC + 8AS — 72AB. (9) 

If 

A=0.1 

= — 0.05 

C=0.5 

S = 9/55, 
then 

Pps = 0.225 

= 0.4 

Ps = 0.275 
and 


rm = — (S — 3B — 4C + 16A)/96 = 0.0008996 
Po = (S — 15B — 2C + 32A)/48 = 0.0648674 
ps = — (S — 15B + 2C — 32A)/48 = 0.0267992 
pi = (S — 3B + 4C — 16A)/96 = 0.0074337. 


For this distribution, which is clearly asymmetric, it can be shown from (9) 
that £=0. Thus, the fact that E(XR) =0 does not imply that the underlying 
distribution is symmetric and hence answers question (ii) in the negative. 

Also of interest, since it provides an answer to question (iii) posed earlier, 
is the following fact: the sign of E(XR) does not depend on the sign of the skew- 
ness of the underlying distribution. 
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An example of a situation in which E(XR) is negative and the skewness is 
positive is as follows. In the example already considered (VOTE: skewness =8), 
let 


A = 0.125 
B = — 0.171875 
C = 0.671875 
S = 0.0625. 
Then 
ps = 0.078125 pi = 0.0011393 
ps = 0.546875 po. = 0.1103516 
Ds = 0.25 Ps = 0.0003255 
pi = 0.0131836 
and 


& = — 0.1486612. 


The example given above shows that it is possible for samples from a distribu- 
tion with positive skewness to have a mean and range which are negatively cor- 
related. However, the example exhibiting this property has three modes. It 
is possible that for a unimodal distribution a positive correlation between the 
sample mean and the sample range implies positive skewness. Investigation of 
this is in progress, but no results have yet been obtained. 

In any case, a necessary and sufficient condition for E(XR) >0( <0) is given 
by 


f f [F (wt) ]»-! — [1 — F(wt)]»-"}dw > 0(<0). 


5. SUMMARY 


It has been shown that symmetry of the parent population implies that the 
sample mean and sample range are uncorrelated. The converse, however, is not 
true. This suggests that some modification of inferences currently being made 
in the interpretation of control chart data may be in order. It has also been 
proved (see Appendix) that symmetry of the parent population implies that 
the sample range and midrange are uncorrelated. 


APPENDIX 


An alternative proof for symmetric continuous distributions may be obtained 
if one considers the ordered sample X,;<X2.< --- <X,. Incidentally, this 
proof also establishes that symmetry of the parent population implies the sam- 
ple range and midrange are uncorrelated. The extension to the discrete case 
is omitted for reasons of simplicity. It is clear that 
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= E(XR) = E{(Xi+ +--+ Xn)(Xa — X1)/n} 


= (1/n) B(XaX, — (1) 
r=? 


Consider first the case where r=n, that is, 
E(X,? — X*) = E(X,*) — = E(X,*) — 


= E(U?) — E(V*) @) 
where U =X, and V = —X;. The pdf’s of U and V are, respectively 
g(u) = n[F(u) (3) 
and 
h(v) = n[1 — (4) 
Invoking the assumption of symmetry, 
h(v) = n[F() (5) 
and thus 
E(U*) = E(V?). (6) 


This shows that, for symmetric populations, the range and midrange are un- 
correlated. Consider next the case where r #n, that is, 


= E(X,X,) — (7) 
= E(UV) — E(WZ) 
where 
U =X, V=X,W=—-—X; and Z= — Xam 


Following the same procedure as before we have 


n! 

g(u, v) = [F(u)][F@) — (8) 
and 
h(w, z) 

n! 
n! 

Thus, 


E(UV) = E(WZ) and p = ogr/ogor = 0. 


CORRELATION BETWEEN MEANS AND RANGES 471 


REFERENCES 


{1] Daly, J. F., “On the Use of the Sample Range in an Analogue of Student’s ¢-test,” 
Annals of Mathematical Statistics, 17 (1946), pp. 71-4. 

[2] Gumbel, E. J., and Carlson, P. G., “On the Asymptotic Covariance of the Sample 
Mean and the Sample Standard Deviation,” Meiron, 18 (1956), pp. 113-9. 

[3] Lukacs, E., and King, E. P., “A Property of the Normal Distribution,” Annals of 
Mathematical Statistics, 25 (1954), pp. 389-94. 


PROBLEMS IN MENTAL TEST THEORY ARISING 
FROM ERRORS OF MEASUREMENT 


Freperic M. Lorp 
Educational Testing Service 


Several unsolved basic problems of mental test theory, arising from 
the presence of errors of measurement in the test scores, are discussed. 
The inadequacies of the classical model, with its normally and inde- 
pendently distributed errors, are pointed out. Two newer stochastic 
models, available for dealing with these problems, are described. It is 
observed that the proportion of test questions answered correctly by 
the examinee is by itself usually not a satisfactory estimate of the cor- 
responding population proportion. Methods for estimating such popu- 
lation proportions are considered. Such methods are required in order 
(i) to determine whether or not two tests measure the same dimension, 
(ii) to measure changes in the characteristics of the examinees. 


HE psychometrician frequently wonders why modern statistical theory 

fails to provide ready-made, satisfactory statistical methods for answering 
certain rather basic questions that he would like to ask. Perhaps the professional 
statistician is also puzzled as to why the psychometrician thinks he has so 
many exceptional problems. Or, going further, the statistician may feel some 
doubt in his mind as to whether the psychometrician’s problems are of a sta- 
tistical nature at all. Are test scores really numerical measurements to which 
the usual arithmetical operations can usefully be applied, or should they be 
considered merely as nonmetric symbols which have few if any of the more use- 
ful properties of the number system? 

The plan of the present article is simply to outline a few very basic problems 
of mental test theory that seem to be well within the present grasp of modern 
statistics, but that have not yet been completely solved and cozily packaged. 
The selection of problems for discussion is dictated by the writer’s interests— 
it is restricted to certain basic problems arising from the existence in almost all 
test scores of substantial errors of measurement. The purpose is not primarily to 
expound, but to stimulate interest in and active development of the statistical 
theory and methods needed in this area of mental test theory. 

There follows first of all a section discussing the meaning and utility of the 
notions “true score” and “error of measurement” (these notions should be more 
acceptable to most statisticians that they are to many psychologists). The next 
two sections outline two usable statistical models for the relationships between 
actual scores, true scores, and errors of measurement. The fourth and fifth 
sections discuss some possible approaches to the basic problems of estimating 
the examinees’ true scores. The last two sections outline two important practical 
problems, the solution of which requires the making of inferences about true 
scores. 


1. THE “TRUE SCORE” AND THE ERROR OF MEASUREMENT 


A mental test is a collection of tasks; the examinee’s performance on these is 
taken as an index of his standing along some psychological dimension. A conven- 
iently noncontroversial example is a test of spelling ability. Conventionally, the 
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examiner chooses n words from the dictionary, requires the examinees to spell 
each of them, and uses the number (or proportion) of words spelled correctly 
by the examinee as his test score, representing, in some useful sense, his spelling 
ability. (It is probable that theoretically better indices of “spelling ability” 
could be devised, but this is outside the scope of the present discussion.) 

Many equally satisfactory sets of n words could be chosen by the examiner to 
constitute a test of spelling ability. Such equally satisfactory sets will be called 
“parallel tests.” 

The basic trouble appears in the fact that an examinee’s score will usually 
be found to fluctuate considerably from one parallel test to another, if more than 
one is administered. The psychologist cannot usefully consider that each 
parallel test represents a new psychological dimension—there simply would be 
too many dimensions for practical scientific investigation. Hence, he must 
say that his main interest lies in whatever it is that these parallel tests all have 
in common. 

This leads directly to the useful concept of the examinee’s “true score,” 
which is frequently defined as the average of the scores that the examinee would 
make on all possible parallel tests if he did not change during the testing process. 
In the case of the spelling test, the true score might, by this definition, be the 
proportion of the words in the dictionary (or in some selected list) that the 
examinee knows how to spell. The error of measurement (e,) is, by definition, 
merely the difference between the examinee’s actual score (t,) and his true score 
(ra) : 

Ca = bla — Ta. 

The reader is referred to Gulliksen [3, chs. 2-5] for a basic treatment of true 
scores and errors of measurement and to various texts [3, chs. 6-10, 14-16; 2, 
chs. 13-14; 13, ch. 12] for an exposition of methods in current use for dealing 
with some of the psychometric problems caused by the errors of measurement. 

The basic task of mental test theory must of necessity be to use the observed 
test scores in order to draw inferences about true scores. Since the observed 
scores differ from the true scores only because of errors of measurement, the 
psychologist can be interested in the observed scores only insofar as they pro- 
vide information about true scores. This fact is not always as apparent as it 
should be, since tests are commonly used for selection purposes, and it is usually 
sufficient for practical purposes in such cases simply to select those examinees 
having the highest observed scores. There are other important practical uses of 
tests, on the other hand, that do require methods for making inferences about 
true scores; two of these will be described in the two last sections. 


2. THE ASSUMPTION OF NORMALLY AND INDEPENDENTLY 
DISTRIBUTED ERRORS 


Problems of mental test theory can be of professional interest to statisticians 
only when the errors of measurement have been conceptualized as stochastic 
variables. Two ways in which this may be done will be mentioned here. 

A classical assumption is that each error of measurement is distributed 
normally, with zero mean, independently of true score. Thus, the conditional 
distribution of the observed score of examinee a for given true score is 
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G(ta| t.) = N(ra, 0°), 


where the expression on the right is the usual one denoting a normal distribu- 
tion with mean r, and a fixed variance o” independent of a. 

o is also the variance of the errors of measurement. Under certain circum- 
stances a good estimate of o? can be obtained by administering two parallel 
forms of the same test to the same group of examinees. (The correlation coeffi- 
cient between two parallel test forms in such circumstances is the important 
statistic known as the test reliability.) In cases where the necessary assumptions 
are not met, there frequently are other, reasonably satisfactory procedures 
available for estimating the variance of the errors of measurement [3, chs. 15, 
16]. Hereafter, then, the variance of the errors of measurement will be consid- 
ered as a give quantity, known within a satisfactory approximation. 

The assum, ..on that each error is distributed N(O, o”) independently of true 
score is probably quite adequate for many purposes. However, it is clear that 
these assumptions can not be met when the true score, expressed as a propor- 
tion of the number of items in the test, is near zero or near one. If n is the num- 
ber of test items, and r,/n is some small number like .01, it is intuitively obvious, 
in view of the fact that the observed test score can never be negative, that 
the distribution of the errors of measurement will in all probability be skew, 
and that the standard deviation of this distribution will surely be less than if 
the true score were not so near to zero. 


3. THE ITEM-SAMPLING MODEL FOR ERRORS OF MEASUREMENT 


There is another way of thinking about errors of measurement that avoids 
this difficulty. As suggested before, it is almost always possible to imagine a 
large pool of test items from which many tests could be built, each of which 
would be considered an equally satisfactory substitute for the test actually 
administered. This pool constitutes a population of items which may be thought 
of as classified into strata on all relevant characteristics, such as item content, 
item difficulty, item discriminating power, and so forth. A whole series of parallel 
tests may be produced by drawing stratified random samples of items from this 
population (this is the essential feature of the Kuder-Richardson definition of 
“rationally equivalent” tests [5]). The examinee’s true score may be thought 
of as the average of the scores that he would obtain on all such samples of items. 
This definition of parallelism provides a stochastic process that gives rise to the 
errors of measurement.! ‘ 

The situation can be simplified by ignoring the stratification and assuming 
the test items to be selected by simple random sampling [7]. There is experi- 
mental evidence (e.g., [1]) to show that the approximations introduced by ig- 
noring the stratification are not too large in many cases. If, as is commonly the 
case, the test items are scored either zero or one, it is seen that the conditional 
distribution of test score for given true score is a binomial distribution: 


flta| re) = (*) 


1 This model has recently been developed by the author in “The Joint Cumulants of True Values and Errors 


of Measurement,” Annals of Mathematical Statistics, in press; and in “Inferences About True Scores from Parallel 
Test Forms,” Educational and Psychological Measurement, in press. 
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It is worth noting that in this formula, although the errors of measurement are 
uncorrelated with true score, the shape of the distribution of the errors of meas- 
urement is definitely dependent on the true score. 


4. ESTIMATION OF AN EXAMINEE’S TRUE SCORE 


Given either of the foregoing assumptions about the nature of the condi- 
tional distribution of observed score for fixed true score, it should be possible 
to use any set of observed scores to make inferences about the corresponding 
true scores. To start out with, for either of the conditional distributions already 
mentioned, the observed score is itself a sufficient statistic for estimating the 
true score. 

In spite of this fact, the observed score is not an appropriate estimate of true 
score in most practical situations. In a typical situation an entire group of 
examinees is tested—for example, all the freshmen at Princeton University, or 
all the applicants for medical colleges throughout the country. When a dis- 
tribution of observed test scores is available, there is one piece of information 
about each individual tested that was not taken into account when the observed 
test score was described as a sufficient statistic for estimating true score: there 
is the additional information that each individual is a member of the group. It 
may be known, for example, that he is a freshman at Princeton University and 
not a four-year-old child, or a poodle dog, or a man from Mars. 

This information has a very real effect on the making of inferences regarding 
true scores. If the first 999 students in the sample display a bell-shaped fre- 
quency distribution of observed test scores, it will be extremely surprising if 
the next student in the sample turns up with an observed test score six standard 
deviations above the group mean. To take a less extreme example, it will be 
mildly surprising if his score is a mere two standard deviations above the 
group mean. This surprise has a clear, logical consequence when it comes to 
inferring the true score of the student whose observed score is very high for his 
group: the most plausible inference is that the error of measurement for this 
student was probably positive, and hence his true score is probably somewhat 
closer to the mean than his observed score. 

The conclusion is that whenever a homogeneous group of examinees has 
been tested (a group which could, perhaps, be considered as a random sample 
from some hypothetical population), then there is available some information 
that was not taken into account when the observed score was described as a 
sufficient statistic for estimating the true score. When this additional informa- 
tion is taken into account, it must be concluded that whenever the observed 
score of an examinee is above the mean of his group, the best estimate of his 
true score must be somewhat less than his obtained score; and similarly when- 
ever the observed score of the examinee is below the mean of his group, then 
the best estimate of his true score will be somewhat higher than his obtained 
score. 

Note that this is not a case where the estimate of a parameter is modified 
because of a priori information about the frequency distribution of that param- 
eter. Instead, it is a case where the sample itself provides information about 
the frequency distribution of the parameter to be estimated. If the sample dis- 
tribution is bell-shaped, this clearly indicates that the true-score distribution in 
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the group tested is not rectangular, for example. The situation is very similar 
to one recently treated in several articles in the statistical literature [4, 9, 10, 
11]. 

In current psychometric theory, the only standard procedure for estimating 
true scores is by means of a regression equation: 


To Brt(te Mt), 


where u is a group mean and 8,, is the ordinary regression coefficient of true score 
on observed score [8]. There is usually no difficulty in obtaining reasonably 
good estimates of the means and of 8,,; from sample data. This equation would 
therefore be reasonably satisfactory if the regression of true score on observed 
score were known to be linear. While a good approximation to linearity may 
hold in most practical situations, one can easily imagine situations where linear- 
ity would not even be approximated—for example, a situation where the true- 
score distribution is actually dichotomous (the reader may wish to examine 
this case for himself). Hence this equation does not provide a theoretically sat- 
isfactory estimate. Some method that does not require the assumption of lin- 
earity needs to be worked out. 


5. ESTIMATING THE FREQUENCY DISTRIBUTION OF TRUE SCORES 


A problem closely related to the foregoing is the problem of estimating the 
shape of the frequency distribution of true scores for a group of examinees. (In 
fact, if the shape of this frequency distribution can be estimated, then the 
shape of the regression of true scores on observed scores can also be estimated, 
and a method for estimating true scores from observed scores by means of a 
curvilinear regression equation will have been found; or, even better, the true 
scores can be estimated by some sort of empirical Bayes procedure.) As a first 
step, it is necessary to know if the frequency distribution of obtained scores for 
examinee a, f(t,| 72), is independent of the corresponding distribution for exam- 
inee b, f(ts| 7»). Under the classical theory with normally and identically dis- 
tributed errors, these two distributions are clearly independent. They are also 
independent under the model that pictures the errors of measurement as arising 
from stratified random sampling of items, but in this case there is no convenient 
formula for the frequency distributions available. Under the model that as- 
sumes simple random sampling of items, there is a convenient formula for these 
frequency distributions (given in section 3), and they are almost, although not 
quite, independent. 

Assuming independence, the distribution of observed scores is equal to the 
integral of the distribution of true scores times the conditional distribution of 
observed scores: 


= f “g(a r)dr. 


This is an integral equation that can be solved to determine the shape of the dis- 
tribution of true scores once the form of the other two distributions is given. 
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To the writer’s best knowledge, this approach has not yet been tried out in 
psychometric work, although it appears to be very useful in the solution of cer- 
tain very similar problems in astronomy [12]. 


6. THE HYPOTHESIS THAT TWO TESTS BOTH MEASURE THE 
SAME PSYCHOLOGICAL DIMENSION 


Actually, there is very little in mental test theory on the problem of inferring 
the shape of the distribution of true scores for a group of examinees. It would 
seem that this is a problem that must be dealt with before mental test theory 
can go much further. Its solution may be basic to that of a rather important 
practical problem—that of determining whether two tests measure the same 
thing. Psychologists are continually publishing new tests of all varieties, but do 
two published tests purporting to measure the same dimension actually do so? 
Do two given tests of intelligence really measure the same thing? We have no 
adequate statistical test of this hypothesis. We are in the position of a scientist 
who continually builds measuring instruments but cannot tell whether two in- 
struments are measuring the same or different physical properties! 

If the true scores on two tests have a perfect curvilinear correlation, then the 
two tests can be said to be measuring the same characteristic. There is available 
a satisfactory method for estimating the size of the correlation between the true 
scores on two different tests only when their relationship can be assumed to be 
linear [3, pp. 101-4; 2, pp. 400-2; 13, pp. 299-301]. The assumption that the 
true scores of two tests have a linear relationship may often be plausible when 
the tests are at about the same difficulty level, but it is not even plausible when 
the tests are at distinctly different difficulty levels. 

The problem here can be formulated in terms of estimating the bivariate 
frequency distribution of two true scores. For two tests, the bivariate distribu- 
tion of observed scores is equal to the double integral of the bivariate distribu- 
tion of true scores multiplied by the two conditional distributions of observed 
scores: 


To determine whether two tests measure the same thing, it is necessary to test 
the hypothesis that 7; and 72 are really functionally related rather than stochas- 
tically related. It should be possible to develop a statistical test of this hypothe- 
sis from the foregoing equation. 


7. THE MEASUREMENT OF CHANGE 


Another situation where a consideration of true scores becomes quite impor- 
tant and where satisfactory methods are at present available only for the linear 
case is in the measurement of change. Suppose two parallel tests (t; and tz) have 
been given to the same examinees on two separate occasions. It is desired to 
estimate the true “gain” or change (72,—71a) in each examinee during the time 
elapsed. Now, in the simple case where there has been only one testing, the 
actual observed scores will rank the examinees in the same rank order as would 
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the best estimates of the true scores. It has been customary to assume, by 
analogy, that when two parallel tests have been given, the observed gains 
(tea — tig) rank the examinees in the same rank order as would the best estimates 
of the true gains. There is actually no such convenient relationship [6]. As a 
result, this is a case where the problem of the estimation of true values becomes 
of direct practical importance in the use and interpretation of mental test 
scores. 

It is seen to be logically necessary to compare the observed bivariate distribu- 
tion of 4; and ¢, with the distribution that would have been found if these two 
parallel tests had been administered virtually simultaneously. The latter dis- 
tribution can sometimes be approximated experimentally, or it can be inferred 
theoretically from an acceptable statistical model for the relation between true 
scores and errors of measurement. Any difference between (a) the bivariate dis- 
tribution of ¢; and 4, cbserved when the two tests are separated by a time inter- 
val (and perhaps by some experimental treatment) and (b) the distribution 
that would have been found if no time had elapsed—any such difference must 
logically be due to changes in true scores rather than to mere errors of measure- 
ment. In this way, it should be possible to make inferences about the true 
change in each individual examinee. Practical methods for doing this have not 
as yet been worked out in detail. 

Sometimes a question is raised as to whether the test-score metric is suffi- 
ciently meaningful so that it is of some use to make inferences about gains or 
losses. Suppose that ¢; and ¢, each consists of a random sample of 100 words 


from the large Webster’s Dictionary. If it is estimated that a student’s true 
score has increased by five points during the school year, this means that it is 
estimated that he can now spell five percent more of the words in the dictionary 
correctly than he could a year ago. Correspondingly, if the estimate of a stu- 
dent’s true gain is negative, this means that it is estimated that he can now spell 
fewer words in the big Webster than he could a year ago. Stated in these terms, 
the utility of the estimates should be obvious. 
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Purposes of Scaling Techniques and Choices of Metrics. Ropertr P. Asetson, Yale University. 


Many substantive problems in the social sciences suffer from the lack of quantitative variables readily at hand- 
Thus arises the need for scaling procedures. Many of the procedures presently available are briefly summarized. 
The different methods differ by very little. Small differences are for some purposes crucial, for other purposes 
negligible. The present paper discusses how much difference makes a difference, depending upon the investigator's 
purpose. Published studies in psychological literature using scaling techniques have been reviewed for the sensitivity 
of the conclusions to possible scale distortions. 

The studies fall into three groups: (i) conclusions totally insensitive to the scale; (ii) conclusions based essen- 
tially on correlation coefficients, possibly sensitive to the scale; (iii) conclusions highly sensitive to the scale. There 
are many indications that the, possible sensitivity under (ii) is quite small. Approximately 90% of published 
studies fall under (i) and (ii). Thus, in the main, choice of scaling method rightly depends more on feasibility than on 
the metric properties of the method. The 10% of studies exceptional to this generalization are discussed in some detail. 


Survey Methods and Medical Care: Strategy and Tactics of a Research Program. Opin W. ANDERSON, Health 
Information Foundation. 


In 1950 the Health Information Foundation was chartered by the drug, pharmaceutical, chemical and allied 
industries as a research foundation to conduct research in the broad problems of the social and economic factors 
in the health field, particularly persona) health services. The Foundation is tax-exempt, non-profit and may not 
engage in propaganda or influence legislation. The paper will attempt to show how a research program directed to 
current issues of public policy was formulated, the research resources and methods drawn, and the rationale for 
choice of problems. Of particular relevance to the American Statistical Association should be the steps taken in 
formulating an active research program using the research methods and techniques available to elucidate problems 
in the health field. Survey research methods have been used extensively and research has been directed mainly at 
consumer problems in financing and utilizing personal health services. As one project was nearing completion other 
resarch problems were indicated thus building up a body of knowledge and facts. Some highlights of data will be 
presented. 
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Methods of Measuring Differences in Social Classes. Turopore R. ANDERSON, Yale University. 


This paper reports an application of factor analysis to problems of identifying the existence of, and measuring 
the social distance between, social classes. Illustrative data are drawn from responses to the mass media of com- 
munication and choice of resid by location. The population was subdivided into categories according to occupa- 
tion. Within each category the proportion of persons acting in each of several specific ways was determined (e.g., 
the proportion of persons regularly reading the New York Times). The set of proportions for any one occupation 
provides a profile of the action patterns of persons within the occupation. This profile was correlated to the profile 
of each other occupation. The resulting matrix was factor analyzed. The factors represent both a means of concep- 
tualizing social space and a mes as of measuring class differences. 

For the illustrative populations (New Haven residents in each case), the factor analyses revealed the existence 
of a small number of occupational clusters, characterized by reasonably sharp differences between the clusters. This 
class differentiation was most marked in the residence study, where inter-occupational differences in income are 
particularly important. 


Economic Considerations in Wage Determination. JuLes Backman, New York University. 


Wage inflation takes its toll in rising prices, lower profit margins, or unemployment. Higher prices cut the pur- 
chasing power of all wages and benefits received under security programs. Reduced profits adversely affect the in- 
centive to invest in new plant and equipment. The result is fewer job opportunities and a slower rate of gain in 
productivity. Unemployment, attending excessive labor cost increases, means that those who hold their jobs obtain 
their higher real earnings at the expense of those who lose jobs. 

Increases in money wages and non-wage benefits have been exceeding gains in output per manhour by two or 
three percent a year. The net impact has been called “creeping inflation.” Some economists have suggested that it 
is impossible to hold down labor cost increases to the level of gains in output per manhour. Presumably, this 
problem arises because of the power built up by the labor unions. 

While price increases of two or three percent a year seem small, they aggregate into a major erosion of purchas- 
ing power over a period of years. The prospect that half of the purchasing power of the dollar would be wiped out 
in a generation is certainly no grounds for complacency. There can be no assurance that inflation would continue 
to creep. As the steady erosion in the purchasing power of money takes place, persons will seek to protect themselves 
by anticipating the price rise. The resulting flight from money would accelerate the rate of increase in prices. These 
developments will require support from monetary and fiscal inflation. That this support would be forthcoming seems 
probable. We cannot escape this dilemma so long as we persist in tolerating wage inflation and insist upon full 
employment. 


The New Canadian Index of Industrial Production. V. R. Bertincurerre, Dominion Bureau of Statistics. 


The recently completely major revision of the Canadian index is described in terms of general concepts and 
methods used as follows: 

(a) The first part of the paper describes the shortcomings of the old index and the need for revision. The 
coverage, basis of classification and conceptual framework of the index are then outlined and its integration with the 
National Accounts and total real output systems on the basis of Gross Domestic Product at factor cost is discussed. 
Results of experiments using different concepts as the basis of weighting (factor cost vs market prices—G.D.P. vs 
Census value added) are described. 

(b) The annual benchmark index were constructed from the comprehensive data on products and materials 
derived from Canada’s annual censuses of manufactures, mines and electric utilities. Roughly one-half of the manu- 
facturing benchmark indexes measure the volume of value-added (deflated output in constant dollars less deflated 
materials and fuel inputs in constant dollars). The significance of these “net” series and their comparison with the 
corresponding gross series are discussed. 

(c) The use of man-hours in the monthly index to represent output and the development of man-hour adjust- 
ments to correct for estimated changes in output per man-hour are analysed. 

(d) A brief account is given of the general content of the monthly index together with a description of the 
approach and problems involved in seasonally adjusting the components. 


Definition of Population Clusters and the Contiguity Ratio. James M. Besners, Purdue University. 


Population clusters may delineate areas, either for administrative or research purposes. The clustering of vari- 
ous population characteristics within these areas can be examined. The contiguity ratio, developed by R. C. Geary, 
is an appropriate measure of the spatial clustering of characteristics of areas. The contiguity ratio is a two-dimen- 
sional generalization of the Von Neumann ratio used in time series analysis. Especially intriguing possibilities for 
apalysis grow out of the use of the contiguity ratio in conjunction with regr i thods. Specifically, one may 
test the clustering effect in the remainders after the effects of independent variables have been removed from 
dependent variables. If the residuals are not clustered, then the independent variables have accounted for the 
clustering effect. The use of distance measures for independent variables permits the construction of surfaces repre- 
senting the clustering effect, and also permits the testing of many of the ecological hypotheses concerning spatial 
distribution. 


Typifying Life Behavior from Sample Test Data. E. G. Branco, General Electric Company. 


Paramount to any reliability investigation is a substantial knowledge of the device's failure mechanism. We 
would define reliability as the probability of surviving a defined duration of environment, given the conditional 
density function of the device and the environment experienced by the device. 

It is shown that the conditional density structure is the basic function typifying life behavior and is practically 
synonomous with the concept of failure rate as a time-varying quantity. Particular emphasis is placed on the expo- 


SUMMARIES OF PAPERS 489 


nential distribution, not as a basic life-time distribution, but as a special case of other more general expressions 
which are more appropriate to a device's failure propensity and which have wider and sounder appeal from a physical 
standpoint. The principal intent here is to show how, in practice, one faces a multitude of Z(t) structures, hardly 
any of which exhibit exponential behavior (i.e., z(t) =A) but most of which exhibit behavior typical of the more 
general failure forms (e.g., Weibull; Type III; Gumbel; etc.). 

Slides illustrate this variety of behavior for different devices as well as for similar devices under varying en- 
vironments. 


The Engineering Shortage—The Economics of the Question. Davip M. Bian, Columbia Br ing System. 


The question of the magnitude, or indeed the existence, of the widely heralded engineering shortage of recent 
years is difficult to answer because the term “shortage” has been used in different ways by different people. With 
regard to those kinds of shortage about which the economist can contribute some understanding, the available evi- 
dence suggests that the degree of shortage has been substantially overstated. 

One suggestion of shortage is based on the view that the country’s demand for engineers should be greater than 
it is; this view is not subject to economic test. Another type of shortage results when wages are restrained by govern- 
ment action from rising to their equilibrium level; however, the market for the bulk of engineers has been relatively 
free in recent years, so that this type of shortage probably has not existed. A third type of shortage is claimed to 
exist when derr \nd increases faster than supply and wages consequently rise (presumably relative to the general 
wage level). Under these conditions some users of engineering services at the old wage level no longer find it economi- 
cally possible to use these services at the new higher wage level, and therefore complain of shortage. But salary 
data on engineers suggest that this occupation has gained little, if any, relative to other occupations. 


Randomization and Least Squares Estimates. G. E. P. Box anp M. E. Mueuuer, Princeton University. 


It has frequently been stated that after randomization has been carried out a standard least squares analysis 
can proceed as if the observations were uncorrelated. It is easy to see that infinite repetitions of sampling including 
randomization do not result in a variance-covariance matrix for the observations in which there are necessarily zero 
covariance terms. Instead, a patterned matrix is obtained whose nature depends on the randomization procedure 
adopted. If the variance-covariance matrix V of the original observations were known, then, in an obvious matrix 
notation, the least squares estimates of minimum variance would be provided by 


b = (1) 
and for a general matrix V this would provide a different result from 
b = (X’X)"1X’y, (2) 


_given by the ordinary method of least squares appropriate when the observations are independently distributed. 
It is shown for the case of the fully randomized design and for the randomized block design that the vectors of 
matrix X become latent vectors of the variance-covariance matrix V generated by the randomization procedure, 
so that (1) is in fact equivalent to (2). Similar equivalence may be found for the analysis of variance sums of squares 
and the variance-covariance matrix of estimates. 


A Single-Semester, Introductory Course in Statistics for all Students Majoring in Economics and Business Admin- 
istration. Georae F. Break, University of California, Berkeley. 


This paper presents the recommendations of a special committee set up recently in the Department of Eco- 
nomics at the University of California, Berkeley. Agreeing that the course musi simultaneously meet the needs of 
students who will take no more work in statistics and serve as a first course for those who will, the committee has, at 
this date of writing, made the following recommendations: 

a. The course should concentrate upon the application of quantitative methods of analysis to problems in 
economics and business. 

b. The theory of statistical inference should be presented primarily im courses given by the Statistics Devart- 
ment. This course will devote not more than 2-3 weeks to that tovic. 

ce. The course should be organized around the solution of problems which the average student will be likely to 
encounter again after graduation. These include: changes over time in the level and distribution of real income, the 
measurement and significance of monthly fluctuations in the unemployment rate, process control and acceptance 
sampling for business firms. 

d. Additional courses in the applications of statistical analysis will be given in the Economics Department at 
both the upper-division and the graduate levels. In each case one or more courses taught in the Statistics Department 
will be required as prerequisites. 


Federal Loan Insurance and Guaranty Programs. Georar F. Break, University of California, Berkeley. 


From their inception in the mid-nineteen thirties federal loan guaranties and insurance have grown rapidly 
into a multi-billion dollar enterprise embracing over two dozen different programs whose effects pervade the eco- 
nomic system. No measure of these effects, however, is included in either the conventional or the consolidated cash 
budget. A special analysis of federal credit programs, included for the first time in the budget for fiscal 1952, does 
show the annual volume of private loans insured or guarantied by the federal government, but these figures cannot 
be interpreted in the same way as can those showing federal purchases of goods and services for federal transfer 
payments. 

This paper is a progress report on a research project, financed by a grant from the Merrill Foundation to the 
National Planning Association, and aimed primarily at deriving quantitative estimates of the loan- and income-gen- 
erating effects of the federal loan insurance and guaranty programs. The paper will indicate the general nature of 
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the analysis, proceed from there to a grouping of the programs according to the analytical problems inherent in 
them, and conclude with a discussion of the problems specific to each group. It is hoped that the results of these 
studies will make possible a more realistic appraisal of federal fiscal actions and policies than can now be made on 
the basis of published budget statistics. 


Education and Training Films in Statistics Grant I. Burrersavan, University of Washington. 

The almost total lack of films and filmstrips for use in the teaching of statistics is cited, and the author gives 
suggested reasons for the situation. Films might be utilized in teaching statistics for purposes of enlivening a course 
of instruction and for giving a change of pace from lectures. However, a course of training, involving discipline or 
drill, must include the working of problems. Filmstrips have many advantages over films for training as distinguished 

1 education about statistics. Action is urged toward the development of additional films and filmstrips 
for education and training in statistics through appointment of a film ittee of the A iation and establish- 
ment of a central film and filmstrip library. The paper includes an appendix giving titles of 30 films, together with 
their source, length, type, and price when available; also an appendix listing filmstrips that might be found useful. 


Statistics in Economic History. Ronpo E. Cameron, University of Wisconsin. 

The “marked quantitative interests” of economic history should make it the most “exact” branch of historical 
study. The major obstacles in the way of fulfillment of this potentiality are the ity and dubi quality of 
quantitative historical data; moreover, these two obstacles are usually related. However, techniques are available 
for judging the reliability of data and allowing for their defects. Quantitative historical data are far more abundant 
than is generally realized although not always easily accessible. The best aids to discovery of data on a given subject 
are a thorough knowledge of the subject itself or of the historical period in which it lies, and a sixth “statistical” 
sense which does not require special training but can be acquired by conscious effort and long practice. Even in the 
absence of precise statistical information, rough estimates of the magnitudes of quantifiable phenomena can be useful. 


The Growth of American Families: Results of a National Survey. AnrHuR A. CAMPBELL AND Pascat K. WHELPTON, 

Scripps Foundation for Research in Population Problems and RoNALD FreepMan, University of Michigan. 

The study, Growth of American Families, is based on a 1955 interview survey of a nationwide sample of 2,713 
white wives 18 to 39 years oid. living with their husbands. The main findings are presented in the book Family 
Planning, Sterility and Population Growth, by Freedman, Whelpton, and Campbell. The study shows that impair- 
ments to the reproductive system are widespread, but do not have an important influence on the average number 
of births to all ples b many ples have had several children before the onset of such impair- 
ments. Contraception (including periodic continence) is almost universally approved and has been adopted by a 
large majority of couples with unimpaired fecundity in all important socioeconomic groups. On the average, the 
wives interviewed expected to have 3.0 births altogether. Differences between the average numbers of births ex- 
pected by wives classified by religion, education, or farm residence are not large; those between other socioeconomic 
groups are very smail. The wives’ fertility expectati uggest that soci ic diff in fertility will con- 
tinue to narrow. They also suggest that the population will continue to grow at a moderate to rapid pace. 


Rates of ont Pao Lun Cuena, University of Massachuseetts and Leonarp 

St 

This paper presents the results of an exploratory inquiry relating rates of change in the N.B.E.R. leading, 
coincident and lagging indicators to the magnitude of contraction and expansion phases of business cycles and 
assesses the implications for forecasting future activity. Rates of change in the indicators were made at four check 
points (1) when a general business turning point was still ahead; (2) after a general business turning point probably 
had occurred; (3) when it was clearly established that a contraction (recovery) was under way and (4) after the con- 
traction (recovery) had been under way for many months. 

The study involves (1) ranking the rates of change for each indicator in all the contractions (recoveries) since 
1919; (2) averaging the ensuing ranks of the leaders, coincident or laggers or any combination of the three groups 
for each contraction (recovery) in order to obtain a single summary rank for that contraction (recovery); and (3) 
correlating the average ranks thus obtained with contraction (recovery) ranks based on the entire contraction (re- 
covery). On the basis of sign: «>t correlations brought out by the study, the current recovery should rank as inter- 
mediate compared to the 7 u. = recent previous cyclical recoveries. 


Decision Theory in Elementary Statistics. Henman Cuernorr, Stanford University and the Center for Advanced 

Study in the Behavioral Sciences. 

Twenty years ago Wald in: duced decision theory in order to generalize the classical theories of statistics. 
His formulation not only es xtain 1 ‘ue others as special cases but clearly indicated that they, particularly testing 
hypotheses, were deficient :. that they failed to consider explicitly the consequences of decisions. This deficiency is 

ible for iderable difficulty encountered by students in elementary courses. 

"The advanced mathematical level of the original research in decision theory and the difficulty in shifting from a 
well established point of view contributed to the slowness with which these ideas have influenced common methodol- 
ogy and elementary teaching. 

Nevertheless, the basic formulation of decision theory, which regards statistics as decision making under un- 
certainty, is substantially easier than the incomplete classical formulations for untrained students. Standard meth- 
ods are better understood from this point of view. It is now possible to illustrate the major results using mathematics 
familiar to high school students. 

Following in the direction of Girshick, Moses and I have taught an elementary course in statistics from the de- 
cision theory point of view. We have had what we consider to be considerable success with students who are mainly 
sophomores and juniors majoring in social sciences, about half of whom have had college mathematics. 
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Statistics of Collective Bargaining. Ewan Ciacur, U.S. Department of Labo’. 


The basic statistics originally used in collective bargaining were occupational wage rates. The cost of living 
developed as a government device to settle wage disputes in war industries in World War I. Productivity, or output 
per man-hour, received some attention during the 1920's, but faded in the Great Depression. Government price and 
wage controls in World War II pushed the price index to major importance; and the postwar inflation (1946-48) 
brought the escalator contracts, which now govern the wages and salaries of over 4 million workers. 

Such contracts also initiated the annual improvement factor, which focused attention on productivity. There 
are also many deferred wage increases without cost-of-living adjustments. The price index was not designed for such 
extreme precision. The productivity data are useful for general economic analysis, but not precise enough for collec- 
tive bargaining. The wage data are deficient in many respects, one of the most important being the dearth of infor- 
mation on the cost of fringe benefits. 


The Social Effects of Tranquilizing Drugs. Dean J. Ciypr, National Institute of Mental Health. 


In order to pinpoint a drug’s specific effects on social behavior, a new and simple rating procedure has been 
developed. This technique allows a patient to describe his own emotions and behavior, and other observers, such as 
the family, can use the same technique to describe the patient. In this way, we obtain a picture of drug effects from 
different points of view. 

Eight investigators throughout the country are now using the new rating procedure, in collaboration with the 
National Institute of Mental Health. The technique is already proving to be sensitive to rather subtle drug effects. 
All tranquilizing drugs are not alike, and they affect behavior differently from the more traditional sedatives. Recent 
data illustrating these points will be presented. 


The Deflation of the Gross National Product by the Department of Commerce. Georce M. Cosren, U.S. Depart- 
ment of Commerce. 


The purpose of this paper is to outline the general deflation procedure by which the current value estimates of 
gross national product are converted to physical volume measures by the Office of Business Economics of the U. 8. 
Department of Commerce. The techniques and probl involved are reviewed first with reference to the annual 
estimates of GNP in constant dollars, and then with reference to the newly developed quarterly deflated series 
which appear for the first time in the December 1958 issue of the Survey of Current Business A number of recom- 
mendations are made for improving the available basic price information used to derive the constant-dollar GNP 
estimates. Finally, the feasibility of further extensi of the deflation procedure is also briefly discussed, with par- 
ticular reference to the problems of preparing deflated national product estimates by industry. 


Estimators for the Normal Distritution When Samples Are Singiy Censored or Truncated. A. Ciirrorp 
Couen, Jr., University of Georyia. 


In life testing, dosage-response studies, target analyses, biological assays, and in other related investigations, 
sample selection or observation is often restricted over some portion of the range of possible population values (i.e. 
for some values of the random variable involved). Samples from which certain population values are entirely ex- 
cluded are described as truncated. Those in which sample specimens with measurements falling in certain restricted 
intervals of the variable may be identified and thus counted but not otherwise observed while the remaining sample 
specimens may be observed without restriction, are designated as censored. This paper is converned with maximum 
likelihood estimation of the mean and variance of the normal frequency distribution when observation in samples 
of these types is restricted in only one “tail” of the distribution. 

Estimators derived in this paper represent a simplification over those previously given for the cases under con- 
sideration here in that they require merely the addition of simple easily computed corrections to the restricted 
sample means and variances respectively. These corrections involve only a single auxiliary function and the practical 
application of estimators given here accordingly necessitates interpolation in only one table rather than in two or 
more as was previously required. In both truncated and censored cases the auxiliary functions are approximately 
linear over moderately wide intervals of the argument, so that accurate interpolation between table entries is real- 
tively easy ip both instances. Necessary tables and charts are included. 


Problems in Estimating Federal Government Expenditures. Samuet M. Coun, Bureau of the Budget. 


There are sometimes wide variations between successive estimates of Federal expenditures covering the same 
period, to say nothing of the difference between estimates and actual results. Some of the variations can be explained 
by the problems inherent in the nature of the budget and Governmental processes. This paper lists a number of prob- 
lems in estimating Federal expenditures and & number of problems in interpreting such estimates that are publicly 
available. It addresses itself to three types of estimates: the long-range projection, annual estimates, and estimates 
for part of the year (such as a month or a quarter). 


How to Appraise Quantitatively the Effects of Government Programs on the Recession—Expenditures and Taxes. 
Geraarp National Planning Association, and Manvet HevzNner. 


The paper approaches the problem of a quantitative appraisal of those factors—private and public—which 
were important contributing causes to the recession. Curtailment of Federal government procurement of goods and 
services during 1957, particularly for national security funetions, constituted one of the major factors contributing 
to the economic decline. These recessive influences of the Federal government emerged at a time when no com- 
pensating increases were taking place in the other consuming and investment sectors of the economy. Increases in 
government transfer payments—in part resulting from the operation of the built in stabilizers and in part from the 
extension of unemployment benefits—-helped to maintain the high level of income and avert a depression. The fact 
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that the impact of some of these increased government expenditures happened to coincide with the period of eco- 
nomic slack elsewhere in the economy may explain why the recession was not as prolonged as some had expected. 

The paper evaluates the extent to which currently available statistics can be utilized to derive quantitative 
measurements. The author questions teh appropriateness of available data on expenditures or contract placement for 
measuring the impact of government activity on the ground that they lead to misinterpretation. Contract place- 
ments are susceptible to gross exaggeration; expenditure figures to understatement. The author attempts, however, 
to approximate a reasonable measurement of the impact of government programs. 


Experiments, Observations and. Surveys. Jerome Johns Hopkins University. 


Scientific investigations differ in their control of variables other than those of immediate interest. In practice 
the difference between investigations in their ability to control other variables seems to be one of degree and not of 
kind. Three different methods of control may be distinguished (a) control by predetermination of the levels of 
known extraneous variables, (b) contro! by comparison at the same, but naturally occurring rather than predeter- 
mined, levels of known extraneous variables, as by cross classification of survey results, (c) control by randomized 
comparative trial. Satisfactory control is often most difficult to achieve in the second case, but no logical difference 
seems demonstrable. In every case the purely empirical task of appraising the particular observed result in the light 
of the possible effect of particular disturbing variables must be undertaken. In particular, the generalization of the 
results of a randomized trial from the units over which randomization was performed to a larger group of units and 
broader set of conditions requires special examination in each particular instance and presents the same kind of 
problems as the interpretation and generalization of survey results. Some illustrative examples are considered. 


Some Recent Developments in Canada’s Quarterly National Accounts. R. B. Crozier, Dominion Bureau of Statistics, 


The orientation of the paper is descriptive, and is focussed on quarterly rather than annual data. It deals 
briefly with the evolution of quarterly national income statistics in Canada and discusses recent work which is under- 
way to strengthen and extend the system. The paper is organized as follows: 

(a) Brief background description. The quarterly National Accounts were first published in Canada in Novem- 
ber, 1953. They are deficient for analyses involving the decomposition of value data into its quantity and price 
components. 

(b) Two approaches are being developed, aimed at measuring changes in the volume of output on a quarterly 
basis. (i) The first is the conventional deflation approach, whereby quarterly Gross National Expenditure value data 
are reduced to velume terms by deflating with appropriate price indexes. This material is now published in Canada 
on a quarterly basis in unadjusted form, but has not yet been seasonally adjusted. (ii) The second is the real output 
by industry approach, whereby indicators of physical volume for each industry, expressed in the form of index 

bers are bined into a composite total index using base period weights derived from an industry breakdown 
of Gross Domestic Product. This material is not yet operational, but is used internally as a check on the results of 
the deflation. 

(c) The need for developing aggregative price indicators on a quarterly basis, and some possibilities in this area, 
are discussed. 

(d) Some tentative results of the above work are given. 


The Relationship of the Centralized Statistical Unit to Management Decision Making. Royat A. Crrstat, Con- 
necticut Medical Service, Inc. 

This paper presents the role of the centralized statistical unit as an aid to management decision making. The 
adv ges of a centralized reporting unit over decentralized unite and the value of a coordinated stastical program 
to management is sti d. The function of statistics and statistical units in the business enterprise is discussed from 
the standpoint of line-staff relationships, the use of the data clearinghouse, statistics as an aid to management, and 
statistical cost systems. A centralized unit reporting to management, through the administrative level, is analysed 
with emphasis placed on its purpose and its routine and special functions. Personnel requirements of a centralized 
unit are enumerated. 

The role of the centralized reporting unit as an aid to management decision making is discussed in relation to 
the functions performed within this area. Included are (1) the attitude of the quantitative specialist; (2) assistance 
in shaping policy decisions; (3) the place of surveys, studies, projections, and forecasts in decision making. 


Some Advantages and Limitations of Automatic Computers in Processing Statistical Data. Josern F. Day, Bureau 
of the Census. 


Modern automatic high speed computers have long since demonstrated their usefulness in the job of reducing 
large masses of statistical data to manageable form. 

But although big general-purpose electronic computers can often perform well-specified jobs more efficiently 
than people, they are far from self-sufficient in the statistical data-processing business. Keeping a large computer 
busy summarizing raw statistica! reports requires either a very complicated set of operations to be performed on a 
small amount of data, or a very large amount of data to be processed through a fixed sequence of operations. More- 
over, it must be assumed that the data are to some extent incomplete and inconsistent. Under these circumstances 
it is exceedingly difficult to plan in advance a sequence of operations which will proceed automatically and yield 
usable results no matter what peculiarities may be encountered in the data. It is even more difficult, as a moment's 
reflection will show, to be certain in advance that the plan will work under actual operating conditions. Unlike 
ordinary mathematical problems, where one can determine by sheer logic what combinations of numbers will cause 
& process to fail, and unlike clerical processing where one can readily examine intermediate results and adjust the 
process in minor ways to assure fully aut pr ing of raw statistical data still presents many serious 


problems, solutions to which have been found only in special cases, often by a frustrating process of trial and error. 
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New Subjects and New Emphasis in the 1960 Housing Census. Warne F. Dauauerty, Bureau of the Census. 


New subjects in the 1960 Housing Census reflect the current government programs, such as urban renewal, 
and traffic, sanitary and city planning. The new subjects also reflect changing trends in housing standards. Most 
of these are aimed at providing basic information for (a) measuring the quality of the inventory, (b) analyzing hous- 
ing markets, and (c) studying patterns of residential financing. Some fundamental changes which have been re- 
quested were withdrawn b of the ity to not increase the cost of the 1960 program over that of 1950. 
For the same reason, certain subjects have been dropped because they are no longer descriptive of the housing in- 
ventory or are no longer sensitive indicators of housing trends. 

A significant shift in emphasis is the inclusion of a measure and description of the gross changes in the housing 
inventory during the decade 1950 to 1960—the demolitions, conversions, and new construction. Further, coverage 
of housing has been broadened to include more types of living accommodations than weré included in 1950. 


Quality Changes and Manufacturing Indexes. Frank pe Leeuw, Board of Governors of the Federal Reserve System. 


Economists agree that our present price and production measures do not adequately reflect changes in “qual-~ 
ity”; but there is no agreement as to the significance or even the precise meaning of the deficiency. This paper begins 
with a definition of quality change and some generalizations about its direction and its various forms. It illustrates 
the generalizations by examining an important example of quality change in the capital equipment area—the changes 
in capacity, fuel requirements, and labor requirements of steam-turbine generator sets. The paper then presents 
some evidence bearing on the relative amounts of quality adjustment in different types of output or price indexes. 
The evidence consists of a series of comparisons of output measures calculated in different ways—by counting units, 
by subdividing total units into detailed size and style categories, and by deflating value. Finally, the paper touches 
briefly on some recent writing about the significance of quality changes. 


The Role of the Statistical Consultant, W. Epwarps Demine, New York University. 


There are several new concepts in sampling that affect the interpretation of statistical calculations and facilitate 
communication between statisticians and experts in subject-matter. Some of the concepts are so new that they are 
barely in print; others are already in general use. Some new concepts are the universe, the frame, the equal complete 
coverage, a logical definition of the margin of sampling error, and the statistical control or audit for the detection 
and measurement of various operational sampling errors and biases. These concepts greatly enhance the usefulness 
of the results of sampling in judicial proceedings and in scientific work. We now see the burden or proof on the equal 
complete coverage, and on definitions and judgments that stem from substantive knowledge, which is where the 
burden belongs, in contrast with the objectivity of statistical cal 

The equal complete coverage is defined as the result that would have come from a coverage of all the sampling 
units in the frame, conducted with the same interviewers, using the same definitions and care as they expended on 
the sample. It then follows logically that the results of a ple are table in place of the equal complete 
coverage provided (1) the margin of error of sampling is sufficiently small fer the purpose; and provided that (2) 
deviations from the prescribed sampling procedure are not sufficiently serious to invalidate the calculation of the 
margin of sampling error. Whether the equal complete coverage would be satisfactory is still another story, depend- 
ing on the frequency and importance of nonsampling errors (e.g., nonresponse, wrong counting, wrong weighting), 
as well as built-in deficiencies of definitions, the questionnaire, and of the frame. 


The Naticnal Institutes of Health Study of Smoking and Cancer. Haroup F. Dorn, National Institutes of Health. 


This report summarizes the results of a perspective study of causes of deaths among 290,000 men who held 
United States Government Life Insurance as of December 1953. This insurance was first issued to members of the 
Armed Services of the United States during World War I and continued to be sold until 1940. A history of the life- 
time use of tob was obtained by questionnaire during the first six months of 1954. The number of deaths in this 
group during 1954 to 1956 provide the data for this report. Mortality rates by cause of deaths, history of tobacco use, 
amount of tobacco smoked, and size of community are presented. A comparison is made of the death rates of non- 

kers and kers for specific causes of death. 


Reliability Program for an Electronic Reconnaissance Set. JuL1an Epeiman, Loral Electronics Corporation. 


The assurance that an Electronic Equipment will operate satisfactorily for an extended period of time under 
specified conditions is the reason for establishing an integrated reliability program. The major operating stages of 
such a@ program may be summarized in the categories set forth below: 1. Establish guides to be used by designers 
relative to the use of electronic parts. 2. Accumulating a history of parts. 3. Estimate effective environmental con- 
ditions for the equipment. 4. Estimate failure rates based upon part stress and environmental conditions. 5. Imple- 
ment a part, assembly and box testing program to confirm estimates of electrical stress and failure rate. 6. Conduct 
continuous analyses and synthesis of testing results and design. 


What Americans Think about Their Hospital and Medical Care. Jacos J. FetpMan, National Opinion Research 
Center. 


Some current journalistic evaluations of the public image of the medical establishment create the impression 
that there is mass dissatisfaction with the nation’s physicians and hospitals. Ostensibly, the substantial sejyment of 
the public yearns for the “good old days” of the family doctor and no longer holds sentiments of confidence and re- 
spect for the medical profession. The results of a number of recent surveys are in marked contradiction to this 
journalistic view of the situation. Actually, the bulk of the population has an extremely favorable image of the 
medical establish t doubtedly in part as a result of the tremendous advances in medical knowledge during 
the last several decades. 
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Of course, some aspects of the behavior of physicians are viewed with discontent by a considerable proportion 
of the population and there is a small minority which is generally disaffected from the medical establishment. But, 
by and large, whatever dissatisfaction there is does not seem to act as an appreciable deterrent to the individual's 
utilization of medical facilities in time of illness. Attitudes toward physicians and hospitals appear to play a rather 
peripheral role with respect to bebavior in situations where the need for medical care is marked. 


Comparison of Designs for Exploration of Response Relationships. Jonn Leroy Fouxs, Texas Instruments, Inc. 


Polynomial type relationships are considered and the choice of design is restricted to an experimental region. 
Criteria of goodness leading to invariance of relative efficiency of designs under simple linear changes of scale of the 
factor space are considered. They are as follows: 1. minimum maximum variance, 2. minimum average variance, 
3. minimum generalized variance, 4. minimum maximum bias, 5. minimum average bias, 6. minimum average (bias), 
7. minimum average mean square error. Optimal designs are obtained for most of these criteria for first degree poly- 
nomial relationships and for certain second degree polynomial relationships. 


Implications of International Postwar Productivity Trends. Marvin Franxe, University of Illinois. 


This survey, necessarily limited in coverage, reveals great differences among countries in the postwar rates of 
productivity advance. For the Soviet Bloc countries, the rates generally have been in excess of 5 per cent per year, 
while for the countries of Western Europe and North America they have averaged significantly lower. From 1948 to 
1956, the Soviet Union's rate of growth in national product per worker approximated 6.5 per cent. (Official Soviet 
data yield a figure almost half again as high). The rates for Poland and Czechoslovakia were in the neighborhood of 
6 per cent. By contrast, comparable figures for the United States, Canada, and the United Kingdom were under 3 
per cent. Among the Asian countries, for many of which data are sparse, Japan stands out with a '48~’56 rate of over 
7 per cent. India, for the same period, registered a yearly per capita growth in national product of under 2 per cent. 
The pattern of intercountry differences does not differ much when industrial output per worker, instead of national 
product per worker, is used as a measure. 

In general, the countries achieving the largest productivity gains also registered the largest increases in output 
and showed the highest rates of investmert. Rapid advance in the Soviet Bloc countries seems also to be associated 
with favored treatment for industry, and minimal emphasis on agriculture and housing, in the allocation of invest- 
ment resources. Other factors important in shaping intercountry productivity diff include sectoral shifts in 
the distribution of labor and post-war reconstruction activities. 


Effectiveness of Our Tools for Estimating Population Change in Small Areas. Cart M. Faistn, California Depart- 
ment of France. 


This subject poses the question: How adequately do our present estimating tools perform their function? The 
function, for present purposes, is defined in terms of the data needs of governmental and private agencies, represent- 
ing situations in which the demographer is asked to provide specific information for small areas. 

Two classificatory schemes are suggested for purposes of approaching the problem. One provides a framework 
for examining the symptomatic data used, while the other relates to the estimating techniques themselves. These 
lead to a consideration of the availability and accuracy of the data, as well as applicability in terms of technique. 

An effort is made to bring together the results of a number of tests of estimating techniques. Particular attention 
is given to experimental work in the application of the “age-specific-death-rate” method for purposes of estimating 
population by age and in the use of utility data as symptomatic of change in the number of households and in total 
population. 


Fertility Estimates from Birth Statistics. K. R. Gaprien anp Ruma Faux, University of North Carolina and Hebrew 
University, Jerusalem. 


Indices of fertility may be required for populati ithout statistics of age and sex composition. Such indices 
can be calculated from birth statistics by using either the ratio of all births to first births or mean birth order. 
These measures may be computed for each age of mother and averaged with appropriate weights to give indices 
independent of the population age distribution. Standard errors of the measures and of the weighted means are 
readily derived. 

Fertility indices based on birth statistics depend on the distribution of births by order. They are related logi- 
cally to maternal fertility rather than to fertility of all women. Empirically some of the indices are found to correlate 
as high as 0.97-0.98 with total maternal fertility as computed from age-of-mother-specific birth-rates. Correlations 
with total fertility are found to be considerably lower. It thus appears possible to obtain from birth statistics 
alone practically as good estimates of maternal fertility as from birth and population statistics. The empirical in- 
vestigation of the correlations was based on data for Australia and Israel. 


Output Measures in the Analysis of Economic Changes. Clayton Genman, Federal Reserve Board. 


Paper discusses pertinence of physical measures of manufactures for analyzing growth and cyclical develop- 
ments in the economy. Census and supplementary sources of benchmark data are noted for purposes of determining 
production trend levels. Strategic character of short run changes in output in interpreting busi cycle mov 
is discussed with special reference to use of commodity groupings to delineate major market developments. Com- 
parisons of relationships between output by industrial origin and final expenditure are developed with implications 
for analysing changes in inventories and in capital goods output and for evaluating consumption trends. 
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Negative Skewness and Its Significance in Relation to Distributions of Performance Ratings of Civil Service Em- 
ployees. James P. Gronae, University of Tennessee, 


In the social and behavioral sciences it has been said that frequency curves skewed to the left or in a negative 
direction are uncommon. It appears to be the opinion of some authorities that frequency distributions of perform- 
ance ratings are apt to be, either moderately skewed to the right, or nearly symmetrical in conformation. 

This paper reports the findings, resulting from analyses of more than 250 frequency distributions of perform- 
ance ratings of Civil Service employees, representing the Departments of Navy and Agriculture, and the Veterans’ 
Administration. Of the total number of distributions coming within the scope of this study fewer than five percent 
were skewed to the right. Over-all distributions in a number of instances were sufficiently large to be regarded as 
parent populations or universes in and of themselves. 


The Use of the Annual Survey of Manufactures ip an Annual Index of Production. Morris R. Gotpman, Bureau 
of the Census. 


The Annual Survey of Manufactures is based on a sample of approximately 50,000 manufacturing establish- 
ments accounting for approximately two-thirds of the total employment in manufacturing. The approximately 
7,000 individual commodities covered in the quinquennial Census of Manufactures are grouped into approximately 
1,000 classes of products which are separately measured in each Annual Survey. 

The paper proposes that the Annual Survey of Manufactures be used to develop an interim benchmark series 
for current measures of industrial production. The Annual Survey provides a consistent set of estimates of both 
output and input. The survey is comprehensive in scope, covering all manufacturing activity. At the present time, 
benchmark calculations are made after each complete Census of Manufactures. 

A number of problems would be associated with such use of the Annual Survey. Most important would be the 
fact that measures of the shipments of the 1,000 product classes are in value terms only. However, for almost 30 
per cent of the product class values, individual current surveys which include physical measures of output are con- 
ducted by the Bureau of the Census. For the remaining product classes a deflation procedure would be necessary. 


Recent Employment Trends. Sinner Go.pstein, Bureau of Labor Statistics. 


This paper traces the patterns of employment in manufacturing industries since 1947 and in each major manu- 
facturing industry group. The descriptive analysis concerns the relation of changes in employment over the post 
World War II decade to the cyclical reactions in each major industry group. It further indicates the magnitudes of 
these changes and the relative positions of employment in the manufacturing industries during this period. 

The paper finds that manufacturing employment begins to be reduced in varying degrees, of course, in different 
industries, generally ahead of the turning points in overall activity. Conversely, the upturn is generally coincident 
with overall activity. 

The most sensitive portion of manufacturing employment in the post-war period consists of those plants pro- 
ducing durable goods. These factories adjusted their employment successively to accumulated World War II 
demands, armed conflict, and plant and equipment expenditures. The most recent down-turn as well as its pattern 
of recovery is described and compared with the previous two downturns in the postwar decade. 

Finally, a differing trend is noted with diverse cyclical reactions between the production worker and nonpro- 
duction worker components of employment in the various manufacturing industries. 


A New Methodology When Equations Outnumber Unknowns. T. N. E. Grevixe, Social Security Administration. 


It has long been known that matrix theory provides a useful shorthand for formulating the solution of a set 
of simult linear equati in the typical case where the number of equations is the same as the number of 
unknowns and there is a unique solution. The solution of such a system is, of course, equivalent to obtaining the 
inverse of the matrix of coefficients. 

The Swedish geodesist Bjerhammar pointed out in 1951 that the generalized inverse or “pseudoinverse” sn- 
nounced by E. H. Moore in 1920 is similarly related to the least-squares solution of a system in which equations 
outnumber unknowns. In the present paper it is shown how this principle can be applied to multiple regression and 
least-squares fitting of polynomials. A recursive or column-by-column procedure for obtaining the pseudoinverse 
allows flexibility in the inclusion or lusion of variables in the regression example, or in choosing the degree of 
polynomial to be fitted in the curve-fitting example, without adding to the amount of computation required. 


Minimum Bias Estimation. Wau. Jackson Hatu, University of North Carolina. 


The Wald approach to estimation theory lays prime emphasis on the risk function (expected loss) as a charac- 
terization of an estimator. Estimators are chosen which will minimize the average (in some sense), or the maximum 
risk. Alternatively, restriction to the class of unbiased estimators may be made and then the risk uniformly mini- 
mised subject to this restriction, thus giving primary attention to bias and secondary attention to risk. 

However, unbiased estimators do not always exist. For this latter situation, we propose “minimum bias esti- 
mators”—estimators which will minimize some bias function (e.g., absolute bias, squared bias, percentage bias) in 
an appropriate manner. Estimators which minimize either the average (in some sense) or the maximum of the bias 
function are considered. We show that such estimators may be obtained by choosing an unbiased esti or of an 
“approximation” to the function to be estimated. By employing certain aspects of the theory of uaiutentiie- 
in the least squares sense and in the Chebyshev (or minimax) sense—we derive results bearing considerable concep- 
tual similarity to Wald’s results on minimax and Bayes estimation, with the role of the risk function here being 
assumed by a bias function. For example, estimators which minimize the maximum bias are shown to minimize 
also the average squared bias, the average being taken with respect to a “least favorable” weight function. 

Application to some binomial estimation problems is also considered. 
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Procedures for the 1960 Census of Population and Housing. Morris H. Haneen, Bureau of the Census. 


The methods of taking and compiling the 1960 Censuses of Population and Housing will increase the accuracy, 
reduce the costs, and increase timeliness of publication of results. 

(1) The complete population census will include the basic head count plus information about age, sex, race and 
marital status. Information on education, employment, unemployment, occupation, industry, income, migration and 
other subjects will be collected from a 25 per cent sample of households. Sampling will serve a similar role in the 
housing census. (2) Advance questionnaires will be distributed to give respondents an opportunity to prepare more 
considered answers before the enumerators call. Steps will be taken to improve census coverage through enumeration 
of people where they are found on the census date as well as at home, and to make use of postal mail carriers to aid 
in identifying missed addresses. (3) The data will be collected on questionnaires that involve the use of positioned 
marks (check boxes). Newly developed electronic equipment known as FOSDIC (Fil, Optical Sensing Device for 
Input to Computers) will convert the data to magnetic tape without manual punching. We hope with this equip- 
ment and with the gains from sampling to speed up publication as much as a year and a half. 


On Obtaining an Answer to the Question. Pamire M. Hauser, University of Chicago. 


One of the several non-sequiturs in the resp of the United States to the advent of Sputnik was the assump- 
tion that the nation suffered from a “shortage” of scientific and engineering personnel. The presumed shortage waa 
supposedly, along with assumed deficient education and training of such personnel in the U.S., vis d vis the USSR, 
responsible for our failure to match the Soviet performance in space rocketry. Efforts to answer the question of 
whether there is a shortage of scientific and engineering personnel, including the papers on this program, are remi- 
niscent of the efforts during the depression '30’s to guess the number of unemployed. Not enough is known about 
either the demand for, or supply of, such personnel to determine whether there was, or is, a “shortage.” Statistical 

thods are available, however, which make possible the collection of the necessary information. If benchmark 
or current statistics on scientific and engi ing p 1 are desired, they can be obtained through the extension 
of existing programs, and the design and conduct of new surveys using both the population and establishment ap- 
proach; and more intensive research investigations are possible to get at the quality, as —— from the 
numbers, of these experts. 


Is Labor the Culprit? Perer Hentz, AFL-CIO. 


The record of the American economy regarding prices during the postwar period is a relatively good one. By 
far the largest proportion of price increases in the postwar period have been the result of special circumstances 
arising either from the aftermath of World War II or the Korean hostilities. Even during the past two years when 
some ists have assigned the blame to “wage inflation,” most of the price increases recorded by the Consumer 
Price Index can be attributed to special circumstances, such as crop conditions, rather than to union-won wage 
increases. 

The charge that wage increases have exceeded increases in productivity and have thus contributed to rising 
prices neglects the fact that when viewed in the context of the entire postwar period, unit labor payments, unit 
non-labor payments, and final prices have all risen approximately the same amount. 

While it is desirable to stress the role of productivity as a source for improvements in living standards, it is 
neither desirable nor practical to attempt to establish by government or private policy a fixed relation between the 
two. Recent developments in wage-price relationships do not provide any compelling reason for altering the basically 
voluntary character of wage settlements negotiated through collective bargaining. 


Preparation of Forecasts of Customers Sales and Revenue in an Electric Utility. Joan M. Henriques, Long Island 

Lighting Co. 

This paper will survey the organization geared to forecast the results of a specific public utility operation, the 
techniques used in the preparation of these forecasts and their utilization as reflected in the annual and capital 
budgets. The forecast of customers, sales and revenue used in the preparation of annual and five year budget are 
prepared by a number of departments using different techniques and form a different point of view. For instance, 
System Engineering prepares their forecast of sales from data collated at the point of generation; while the budget 
department forecasts annual sales from data collated at the customer level. These f are evaluated by a 
planning committee before being presented to top management. 

The techniques used to develop each of these forecasts are described. The second half of the paper will briefly 
consider the forecasts used to formulate plans for capital expansion of generation and distribution plant facilities. 


+ 


A Survey of New Product Activity in the United States. Ricnarp C. Hensnaw, Jr., Michigan State University; 
Mitton V. Jouns, Jr., Stanford University; anp Rorvce W. PiyLer, Humble Oil Refining Co. 


The purpose of this survey was to investigate the scope of new product activity in the United States at the least 
possible cost. The information obtained on this dynamic sector of the nation’s economy will be of interest to persons 
concerned with probl of ic growth, marketing, research and development, and publishing. Statisticians 
will be interested in the mathematical development of the statistical estimating formulae as well as in the data 
themselves. The information obtained in the survey may be used as a basis for discussing whether the U. S. Bureau 
of the Census should collect such data directly from manufacturers in subsequent Census of Manufactures. 

This paper presents for the first time the results of a sample survey of new products which were publicized in 
consumer ma busi periodicals, and farm journals in February 1956. A random sample of 50 issues was 
selected from the 3,576 February 1956 issues of the 2,307 consumer magazines, business periodicals, and farm jour- 
nals published in the United States. 
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Projecting Local Government Expenditures in Metropolitan Areas. Werner Z. Hirscu, Washington University and 
Resources for the Future. 


An attempt is made to improve our understanding of the multitude of factors with complex interrelationships 
that determine expenditures of local governments. Empirical expenditure functi for specific local government 
services are estimated with the aid of multiple correlation and regression techniq The expenditure function of 
&@ given service, e.g., public education, will have specific inputs. Inter-industry relations analysis, or also simpler 
methods, provide projected values for some of the independent variables in the expenditure function as well as for 
population, so that total local government expenditures can be projected. 

The empirical expenditure function approach is applied to one large metropolitan area. Separate functions and 
lasticities are estimated for public educati fire protection, police protecti street service and refuse 
ng to about two thirds of total local government expenditures. Ten year per capita and total 

e projecti point to minor increases in the core city if measured in real terms, while huge increases are 
projected for suburbia. A more advanced projection model which would relate local government expenditures to an 
area's economic activity within a unified general equilibrium system is presented and its implementation explored. 

Finally, two main approaches toward the solution of suburbia’s tremendous future fiscal probl are di 


Ait 


Is This a New Type Inflation? Gzorce P. Hircurnas, Ford Motor Co. 


Sharp or sustained mov ts of prices generally in one direction can be harmful to economic growth and sta- 
bility. Rising prices (or inflation, in the popular use of the word) do not create purchasing power unless they stimu- 
late greater production. They do redistribute purchasing power. In the short-run, rising prices can induce greater 
spending and production in real terms, but in the long-run the opposite effect can result. The causal factors in rising 
prices are often difficult to determine because of identity relationships in ic data iated with rising prices. 

Inflations in the United States have generally been associated with wars. Sharply increased demands for goods 
and services during wartime have been financed to a large extent by expansion of the money supply and use of 
previously idle money. Funds available for spending by busi and consumers exceeded the quantity of goods and 
services available to them at constant prices. During World War II, these funds were held in check by rationing 
and price controls. The return to free markets released these funds to bid for available goods and services. 

The rise in prices since 1951 has been of a different nature. Excess money demand has not been a potent force, 
particularly in rising prices at the consumer level. The generating force for price increases has come largely from the 
supply side. Business profits per unit of production have declined since 1951. This not only indicates that excess 
demand generally was not the cause of prices increases, but that costs were. The rise in unit costs has been concen- 
trated in labor, depreciation, indirect taxes and interest. 


A Method of Projecting the Number of Households in Small Areas. WitL1am Hopakinson, Jr., American Tele- 
phone & Telegraph Company. 


Household projections are frequently made in omell area research by dividing estimates of future population 
by corresponding estimates of average population per h hold. Estimation of the latter ratio is difficult, however, 
and presents pitfalls for the unwary. The framework of the problem can be simplified if projections of both popula- 
tion and households can be had for a larger area which contains the smaller. The problem then reduces to that of esti- 
mating the future proportion which average household size conceptually and possesses the advantage that in its 
formation any factors will cancel out which are common to the ratio of population to households for both areas. 
Such proportions are easily computed from past censuses and seem to follow consistent and logical patterns over 
long periods of time. 


Use of the Observation Technique in Measuring Retail Sales. Ean: E. Houseman, U. S. Dept. of Agriculture and 
BengsaMin Lipstein, Benton & Bowles. 


The merits of observation technique (i.e., having observers stationed in stores during specified periods to record 
sales as made) of measuring retail sales of selected food products are compared with the merits of the standard audit 
procedure. Both methods were used simultaneously on an experimental basis in a sample of stores in Philadelphia. 
This is a report of the results of that study, particularly matters of sample design. 


Use of BLS Price Indexes in Deflation of Value Aggregates. Sinner A. Jarre, 3ureau of Labor Statistics. 


The deflation process attempts to find the physical equivalent of value aggregates in the prices of different peri- 
ods. Deflation is best done in detail and with careful consideration of the characteristics of both value and price 
data. 

The wage and clerical worker oriented CPI is built upon a sample of 300 carefully defined items priced in retail 
outlets in 46 cities. Prices collected are, however, relevant to all urban purchases and to deflation of that seg t of 
the personal consumption expenditures account of GNP. A larger sample of priced items and qualities would im- 
prove the CPI and facilitate deflation of purchases by all urban consumers. Likewise, extension of pricing coverage 
to rural nonfarm consumers would eliminate imputations currently necessary. The 1900 WPI item indexes are used 
selectively in deflating nonpersonal GNP, and to deflate industry shipments and inputs. Gaps in WPI price coverage 
limit such uses. In areas where custom production is important, e.g. heavy machinery and industrial equipment, the 
price data are not satisfactory. More generally, existence in primary markets of a variety of prices for an item, cou- 
pled with the limitations of mail data collection, require that the WPI datu be used with care. 


Weibull Analysis and Synthesis of Assembly and Component Reliability. Leonarp G. Jounson, General Motors 
Corp. 
This paper discusses methods of predicting Weibull life distributions for assemblies which are built up from 
components with known Weibull parameters. The response of assembly life to a given improvement in a component 
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is also studied. Convenient charts and nomographs for use by engineers and designers are included. Accurate em- 
pirical curve fits are rapidly made with the help of high speed computer programs. 


Quarterly Estimates of Capital Formation in Canada. D. H. Jonzs, Dominion Bureau of Statistics 


Quarterly estimates of capital expenditures in Canada have been prepared by the Dominion Bureau of Statistics 
since 1949, and published in regular quarterly reports on “the National Accounts.” The estimates are obtained by 
projecting annual data, derived mainly from direct surveys, on the trend of monthly or quarterly statistics of manu- 
facturers’ shipments, imports, housing starts and completions, employment reported by construction contractors, 
and similar data. Following a brief description of methods and particular problems encountered in using this 
method, the results are compared with preliminary and revised series based on direct surveys, as published in re- 
vised annual reports on the “National Accounts,” and annual reports on the “Investment Outlook.” For this pur- 
pose, estimates for the first three quarters of each year, as originally published, are used to prepare an advance 
estimate for the full year, and this advance estimate is then compared with the direct survey results. The comparison 
shows that the quarterly series has in the past provided a reasonably good advance estimate of aggregate capital 
expenditure. Performance of the quarterly series in this respect is judged to be at least as good as that of preliminary 
estimates based on direct surveys. 


A Statistical Model for Evaluating the Reliability of Safety Systems for Plants Manufacturing Hazardous Products. 
Louis B. Kaun, Shell Development Company. 


This paper outlines a technique for determining the probability of failure of a plant manufacturing a hazardous 
product. For the failure of the plant to occur,' the failure of the safety system must precede the failure of the 
plant's critical operating system in any “turnaround” or inspection period; where the failure of the operating system 
would precipitate a hazardous situation. The probability of plant failure is calculated on this conditional basis. 
Derivations and applications of probability of plant failure functions will be illustrated for differently designed 
safety systems, showing the probable risk of disaster, where the components are given first in series and then in 
parallel. Under the assumption that the exponential theory of failure applies, probabilities will be calculated, show- 
ing the effect of changing component reliability and turnaround periods, giving an insight into the problem of re- 
dundancy. Examples of disastrous occurrences in industrial practice will be cited, which point to the need and appli- 
cability of a scientific approach to safety system evaluation in these industrial areas. 


Estimation of an Average Life Parameter from a Sample from Mixed Populations. Lo Karz, Michigan State Uni- 
versity. 

In the early stages of life testing of a new product, the objects under observation can be considered to be a mix- 
ture of properly constructed product and an unknown number of improperly made pieces which are expected to 
fail fairly early in testing. An inexpensive product may be purified by pre-testing to remove most of the contaminat- 
ing population. When equipment is very expensive, however, it usually becomes necessary to make estimates based 
on the earliest observations available, a substantial portion of which may be from the improperly-made population. 
We present a general procedure designed to produce unbiased estimates of the mean-length-of-life parameter of 
the population of properly-made product. We further indicate how a computer may be used to evaluate the opera- 
tional effectiveness of the procedure for particular mixtures of two exponential populations. 


Experiments Versus Surveys. Oscar Kempruorne, State College. 


Experiments and surveys are diff t basic techniques for attacking some questions in the social sciences, 
and the purpose of this talk is to compare their relative merits. The basic structure of a randomised experiment 
will be reviewed, and the aspects of survey examination of effects of treatments which are the same and are different 
in experimental examination will be discussed. 

In an experiment a set of reproducible operations constitute an imposed treatment. In analysis of a survey, 
some units found in the sample have one set of characteristics and are classified as having received a particular 
treatment, others have a different set of characteristics and are classified as having a different treatment. Difficulties 
arise from two sources in an experiment: (1) the treatments may not be perfectly reproducible (2) the effects of the 
treatments may be non-additive. In analysis of a survey the difficulties arise from three sources: (1) the “treat- 
ments” are not well defined and are not constant, (2) the effects of “treatments” are not additive and (3) the associa- 
tion of “treatments” with attributes of units of the population is not random. The roles of these aspects will be 
discussed. 


Domestic Implications of Postwar Productivity Trends in the United States. Joun W. Kenpricx, George Washing- 
ton University. 


Between the first and most recent business cycle peaks of the postwar period, 1948 and 1957, total factor pro- 
ductivity in the private domestic economy increased at an average annual rate of 2.1 per cent, the same rate as 
prevailed over the longer period 1919-1957. The more conventional measure, real product per manhour, shows an 
increase of about 3 per cent a year over the same period—a bit higher than itz longer-run trend because of the high 
post rate of i in capital per manhour. There has apparently been no significant acceleration in the trend- 
rate «f productivity advance since the war. Since 1948, productivity has accounted for more than half of the rise 
in real product per capita; input per capita rose somewhat faster than over the longer period primarily because of 
the high postwar investment rate. 

There has been considerable dispersion in rates of productivity change by industry group, although a bit less 
than in some earlier periods. Relative rates of productivity change appear to be significantly correlated with relative 
rates of change in output, with the relative amplitudes of cyclical fluctuations in output, and with the relative 
magnitude of outlays for research and development. Those industries with larger than average increases in produc- 
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tivity have tended to show smaller than average price increases. This negative correlation between relative changes 
in productivity and in prices helps explain the positive correlation between relative changes in productivity and in 
output. The relative increases in output have been sufficient to more than offset the relative increases in produc- 
tivity, however, and the employment of both capital and labor has tended to increase more than average in the 
technologicaily more progressive industries. 


Business and Babies, the Relations Between Economic Fluctuations and Birth Rates. DupLey Kirx, Population 
Council. 


This paper is an examination of the extent to which economic fluctuations are followed by comparable changes 
in birth rates, given an appropriate time lag. The influences of several economic measures are tested including 
indices of industrial activity, of personal income in constant dollars, of employment, and of consumer prices. In 
each case indices have been correlated with marriage rates (marriages per 1000 unmarried women at ages 15-44) 
and birth rates (births per 1000 women at ages 15-44). 

An attempt in made to measure the apparent effects of economic fluctuations on natality both directly and, 
indirectly, through their effects on the number of current marriages. The analysis covers the period 1920—57 (annual 
data) with more specific analysis of monthly and quarterly data for the post-war period. 


The Sample Surface in Variables Sampiing. R. L. Krrxparrick anv R. D. Summers, Bendix Aviation Corp. 


A geometric visualization of the decision criteria in variables sampling readily suggests ‘a number of useful 
applications. This paper is primarily devoted to the description of these applications and to graphs and tables to 
facilitate their use. 

Constructing an ordinate £—ks at each point in the sample space, a surface (the “sample surface”) is generated. 
“Rejection numbers” are determined such that if the number of items within a sample beyond the specification 
limit is equal to or greater than the rejection number, the variables criterion must reject the lot. These rejection 
numbers may be used to: 1. Reduce computations. 2. Compute limiting OC curves for departures from the nor- 
mality assumption. 3. Prepare improved combined variables-attributes sampling plans permitting rejection on the 
first sample. 4. Curtail sampling. 

Transforming the sample surface, a graphical means of screening for extremely good or poor lots is developed 
based on the sample maximum and minimum which eliminates computations. Using a previous development by 
John E. Walsh, the effectiveness of this procedure is investigated. The construction of sampling plans whose de- 
cision criteria depend only on maximum and minimum sample values is discussed. 


Problems in the Quantitative Appraisal of the Influence of Government Programs on Prices. James W. KNow.LEs, 
Joint Economic Committee, U. 8. Congress. 


The quantitative appraisal of the influence of Government programs on prices requires an analytical frame- 
work dimensionally defined so that a measurement process can be applied to each dimension. Government programs, 
therefore, must be analyzed and measured as processes so that the measurements can be used to analyze the rela- 
tionships over time of government policies to private economic processes which lead to the determination of prices. 
These ideas are familiar in the field of tax incidence and the study of tary policies though achiev ts have 
been more meager than the growth of our requirements for policy purposes. 

This type of quantitative analysis and measurement has not been applied to Government expenditures in the 
detail needed and hardly at all to regulatory policies. Many major problems in this area arise because price measures 
have not been designed to fit an analytical scheme relating private pricing activities to government policies; espe- 
cially as regards the economic structuring of the indexes, the handling of problems of quality change, and the 
measurement of prices of government purchases. 


Community Adjustment of Former Mental Patients and Needed Steps for Their Assistance. E:se B. Kris, N. Y. 
State Dept. of Mental Hygiene. 
Intensive aftercare of patients released from mental hospitals has made it possible to reduce the rate of re- 
hospitalization considerably. In a four-year follow-up study during which 400 patients were carefully observed, it 
was — to keep the return rate to the hospital at about 10% as compared to the generally reported rate of about 


30-35%. A iderabl ber of p ts was abie to successfully handle duties as home makers, thus bringing 
children back to the parental homes. ‘Many former patients, even those with a history of hospitalization over several 
years, were able to d in obtaining gainful employment. Pharmacotherapy seemed not to interfere with work 


capacity and work performance, nor did this cause any accident proneness. No untoward side effects due to pro- 
longed continuation of drug therapy were observed. Some of the most pressing existing needs to make for better 
community adjustment of former mental patients are: Better social planning prior to the patient's release, better 
education of employers and the public in general, Vocational Rehabilitation Services, better organized family care 


programs, etc. 
On an Operational Emphasis in the Management of Applied Research. Borp Lapp, Johns Hopkins University. 


Great advances have been made in techniques of management science during recent years. But the manage- 
ment of technical activities continues to offer very substantial opportunity for improvement. More effective utilisa- 
tion of the supply of human resources in scientific and engineering fields is nationally significant. 

At the project level, approaching an optimum in management practices is equally of concern. The Executive 
responsible for the research program, the Research Manager, and the individual engineer are joint participants 
in the management of research. This shared responsibility for the vigor and intensity with which the research is 
prosecuted is a central determinant of its effectiveness. 
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Some principles for management practice are deduced. A formulation of the problem of project management 
within the context of an integrated broader research program, and in terms of dynamic considerations over time, is 
proposed. Next steps in management analysis, consistent with research freedom and with intelligence utilization 
of scarce resources, can be inferred. 


Statistics in the Business Administration Curriculum. Maurice W. Lee, University of North Carolina. 


The undergraduate statistics course today occupies a position which can be most easily compared with that 
of foreign languages in a great majority of the doctoral programs in American universities. It comes too late in the 
etudent’s program; its relevance is not apparent to him; it has token support from the rest of the faculty but as 
with foreign languages almost none of them employ this tool in their own courses; the course itself is generally a 
deadly-dreary exercise in formula memorization and data processing routines. 

At the MBA level statistics is either not required at all or is introduced again as a pro forma requirement with- 
out relevance to the rest of the program. 

From this set of premises the present paper proceeds to a consideration of possible solutions to the problem, 
including the suggestion that the most effective action may be the total withdrawal of statistics from the curriculum 
of business administration students, both graduate and undergraduate. The remainder of the paper deals with 
various suggestions for bringing statistics back into the picture in a different climate more favorable to its proper 
role in the discipline. This is explored separately in the cases of the undergraduate and graduate curricula. 


A Business Viewpoint on the Adequacy of Monetary and Financial Statistics. Westey Linpow, Irving Trust 
Company. 


Although we have a wealth of banking ard financial statistics they are not always as good as they seem, and 
they do not answer many questions. Consider bank loan data for example. The commercial and industrial category 
is widely accepted as the measure of business loans and as such is often used for busi cycle lysis. Yet this 
group includes loans which careful analysis suggests should be elsewhere. Examples include loans to sales finance 
companies and loans for warehousing mortgages. The former probably should be taken out of commercial loans 
and combined with loans to personal finance companies. The warehousing loans probably should be moved to the 
real estate category. 

Some kind of basic distinction also probably should be made between loans for money market purposes and 
other loans. Then non-money market loans should be broken down by industries in ways which would improve 
on the present industrial classifications of commercial loans which have several weaknesses. 

Statistical data are also not really adequate in several other areas such as federal funds, securities sold on 
repurchase agreements, pledged securities, interest rates charged on loans, bank earnings and expenses, and the 
data underlying the financial ratios used to analyze banks. In some of these cases data are not internally comparable, 
in others they do not exist to a desirable degree, and in still others they seem to invite misuse. 


A Suggested Method for the Delimitation of Population Clusters. Arrnur F. Loesen, Planning Commission, 
Montgomery County, Pennsylvania. 


The Stead, used in the sense of homestead, farmstead, storestead, and gasoline station stead is used as a unit 
of analysis. The stead is the combining unit which in the aggregate constitutes clusters of agglomerations of settle- 
ment, and such clusters are delimited by two different but related approaches. A Nucleated Area is defined as a 
minimum of 5 steads, spaced within 250 feet of each other and has the shape of the steads enclosed. The steads 
in a landscape which do not nucleate are plotted as dispersed steads. A D-Line — of analysis is applied to 
the dispersed steads, and the resulting Agglomerated Areas are delimitati trati or clusters. An 
Agglomerated Area has 3 steads each with D distance of the other two. Often, cieaneatel Areas of varying size 
and shape group together and form a unit. By varying D the element of the settlement landscape are organized at 
different levels of analysis and generalization. The Gettysburg, Pennsylvania area has been treated and mapped as 
# sample area following such procedures. Any area so mapped is comparable to others, and is objectively delimited. 


The Philadelphia Land-Use Inventery. Hartin G. Loomer, Philadelphia City Planning Commission. 


The Philadelphia Land-Use Inventory, now being developed by the City Planning Commission in cooperation 
with several other City Departments, consists of a complete listing of every parcel of real property in the City of 
Philadelphia (approximately 650,000), on I.B.M. cards and print-off. Primary data for each parcel show lot size, 
building coverage, type of structure, soning, occupancy (S.I.C. code), activity conducted on premises (A.O.P. code) 
and assessed value—land, buildings and total. Appropriate additional data are shown for each type of different use 
group—i.e.: Floor-space, ber of employees, off-street parking or loadingf acilities etc., for nonresidential proper- 
ties; number of dwelling units , tenure, occupancy etc., for residential. Derived data such as value per second foot, 
floor/ground area ratio, floor/employee ratio, etc., follow on trailer decks. All cards are coded for location by ward, 
census tract, block and street frontage. Perpetual up-dating is provided for by coordination of this series of I.B.M. 
cards with those of other Departments to which changes are reported, such as the Department of Records, Board 
of Revision of Taxes, Department of Licet ses and Inspections, etc. The system also provides for cross-matching 
with any source of data listed by standardized street address. 


Inequalities for Stochastic Linear Programming Problems. ALserT Mapansxky, Rand Corp. 


Inequalities for the value of the objective function in two types of stochastic programming problems are ob- 
tained. In these problems only the right-hand side of the problem is assumed to be random with known expected 
value, 
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Multiple Regression Involving Continuous Processes. Inwin Mituer, U. S. Steel Corp. 


It is assumed that n(t),+++ , 2p(t) represent continuous input processes, and y(t) represents a continous output 
process. The relation between y(t) and the 2;(t) is linear, but dependent on the lag times 7,*** , rp as follows 


u(t) = — + — m2) + Byzp(t — rp) + 


In this formula, ¢(t) is a stati tochastic process with zero mean and 7 anpeigmagnee function R(r). It can be 
shown that for the variance of the enna e(t) to be minimized, 6:,+** ,8pand n,*** , rp must satisfy the 2p equation 


= BiRis(rs — 71) 

= 1, 2, Pp 


where Rj;(r) is the cross-covariance function of the pr xj(t) and 2;(f) and Riy(r) are the cross-covariance 
functions of xz; (t) with y(t). 

In case the x;(t) are uncorrelated, i.e., Rij(r)=0 (i, 7 =1,-** , p), an exact solution is obtained. In case the 
2;(t) are correlated, an iterative scheme is available for obtaining approximate estimators for the #’s and r's. 

An example is presented to show the decrease in the residual error (variance of ¢(t)) obtained by estimating 
the lag time 7;,-** , rp as opposed to performing a standard multiple regression analysis without regard to the lags. 


Early Failures in Life Testing. Rupert G. Mitier, Jr., University of California, Berkeley. 


Data from certain life test experiments exhibit such an unusually high concentration of failures near time 0 
that the assumption of an over-all exponential density is unwarranted. A reasonable hypothesis for this phenomenon 
is that due to faulty construction or defective parts certain test items fail prematurely; these items are termed 
“early failures.” To handle this type of data an early failure model is postulated in which one failure rate is assumed 
to be in effect for an initial time interval (0, To) and another, lower failure rate is operative thereafter. Estimators 
for the two failures rates are given in the case where 7> is known and in the case where 7» is not known exactly 
but can be assumed to be within a specified interval. Methods for obtaining approximate large sample confidence 
regions are outiined, and procedures for handling small samples are described. 


The Problem as Faced by the Federal Agencies. Tnomas J. Miuus, National Science Foundation. 


It may be said that there are two principal facets to the Government's concern on the matter of shortage of 
scientists and engineers. In the first place the Government, as the largest employer of such personnel, is concerned 
that adequate numbers are available for employment on those programs which it administers directly with its own 
employees. Such programs range over all the sciences and require personnel at all levels from the novice in a bio- 
logical laboratory to the senior scientist responsible for a highly sophisticated missile system. While size and com- 
plexity greatly complicate this problem, its solution is probably not greatly different from that to be found by any 
employer competing for a scarce resource. 

The second facet lies in the ill defined area of “public policy.” Government, as the instrument through which 
public policy is implemented, has the ultimate responsibility of assuring that the supply of scientists and engineers 
is sufficient for all purposes recognized as important. It is now generally accepted that the economic, political and 
military threats to the Nation require the strongest kind of scientific and technological defense. Government has the 
role of making certain that resources are at hand for that purpose. This requires a capability for determining present 
manpower requirements and future growth trends in order that the action programs adopted will be effective in 
meeting scientific manpower needs while disrupting desirable present training programs as little as possible. 


Some Training Needs of Foreign Statisticians. Jonn W. Morse, Hobart and William Smith Colleges. 


A shorter more provocative subtitle might be “Philosophy and Filing.” These two radically diverse but critical 
points are discussed, within the framework of the problem of human progress, in an endeavor to gain some insights 
as to why the statistical methods are not more widely and more effectively employed in the world. In taking a 
long-range, broad view of our profession, we are reminded of our vital link in the scientific method which provides 
a systematic way for learning and progressing. Hypotheses are advanced as to why we are not more active in pro- 
moting the relatively new philosophy which deals with uncertainty and helps to remove the wall between the physi- 
cal and social sciences, and to facilitate the much needed advancement of the latter. At the other extreme attention 
is called to the frequently neglected organizational and administrative aspects of statistical work Here too, some 
tentative hypotheses and suggestions are offered 


Use of Computers in Linear and Non-Linear Statistical Estimation. Mervin E. Muturr, Princeton University 
and I.B.M. 


A brief review of linear estimation via a digital computer is given, including consideration of procedures to 
assist with the estimation of parameters when the design matrix, while not singular, is poorly conditioned. Attentt 
is given to the question of whether procedures for the analysis of variance should utilize possible orthogonalities in 
the design. The importance of residual analysis is stressed. 

Several possible approaches to non-linear estimation are outlined and explicit attention is given to a procedure 
developed by G.E.P. Box and bers of the Statistical Techniques Research Group at Princeton University. In 
particular, the IBM 704 program for non-linear estimation which has been developed in cooperation with IBM and 
programmed by G. Booth and T. Peterson is presented. The implications and uses of this program are discussed. 
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‘These include: (1) the consideration of the role of a preliminary analysis of an approximation to the moment matrix; 
(2) the actual estimation procedure, noting its use when the statistica! model is specified by a system of differential 
equations; and (3) the presentation and use of linearized confidence contours of the parameters. The paper concludes 
by considering the problem of putting data into a computer and the need for techniques to detect errors in the data. 


Nominal Confidence Limits for the Expectation of a Poisson Variable. NiLan Norns, Hunter College and New York 

University. 

Because of the discontinuous character of the distribution, nominal confidence limits for the expectation of a 
Poisson variable are wider than the maximum true limits, especially for small values of expectations, m =np. 
Biometrika Tables for Statisticians, Vol. I, p. 203, express the nominal limits to three significant figures for the 
smaller expectations. Various proposals for continuity corrections have not been widely accepted. If the conservative 
feature of the nominal limits is acceptable, and if precepts for the use of significant figures are obeyed, occasionally 
there is a need for a table in which expectation limits are given to at least four figures. For observed sample occur- 
rences varying from c =0 to 14, a table is prepared for the upper and lower nominal coffidence limits of the expected 
number of occurrences per unit. All nominal limits are given to at least four figures for confidence coefficients of 
.998, .99, .98, .95, and .90. Conversion factors are used to obtain from standard chi-square tables the nominal limits 
of the expectations. Although the conversion between the Poisson and the theoretical chi-square distribution is 
exact, the nominal confidence limits so established for the Poisson are wider than the maximum true limits, particu- 
larly for small expectations. 


What the Tranquilizing Drugs are Doing to the Population in Mental Hospitals. Ropert E. Parron, New York 
State Department of Mental Hygiene. 


For fifty years until 1955 the patient population of the New York State mental hospitals had climbed steadily 
from 28,000 to 93,000. Early in 1955 the large-scale use of tranquilizing drugs began in these hospitals. In the three 
years since then the patient population has decreased to 89,000 in spite of an increase in admissions. The issue is 
whether the relationship between the use of tranquilizing drugs and the population drop is coincidental or causal. 
Evidence is given to show that the population decrease is due primarily to an increase in the number of patients 
released and that this increase in releases is probably related to the use of these drugs. 

The use of tranquilizing drugs is now being accompanied by an intensified treatment program and by an ex- 
pansion of the open hospital policy. The future effect of this expanded program on patient population is estimated 
on the basis of present treids. The characteristics of the p it p will conti to change rapidly. There 
will be increased proportions of both youthful and aged patients in n the hospitals, where they will stay for much 
shorter periods of time. The ber of p dmitted will continue to increase. The range of conditions treated 
will expand. In summary, the mental hospitals will continue to shift their emphasis from custodial care to active 
treatment. 


Financial Statistics and Financial Policy. J.J. Pouax, International Monetary Fund. 


In recent years money and banking type statistics have been extended to cover other financial institutions and 
the lendings and borrowings of sectors, thus producing, once the income accounts are themselves sectored, integrated 
income and financial! statistics. These statistics are significant if they provide a framework for the integration of in- 
come and monetary analysis. They can do this only if they are constructed in accordance with an integrated income 
and monetary theory. Net lendings and borrowings provide an alternative, and possibly better, measure of the 
“savings” and “investment” variables of the income analysis. Integration of i and tary theory, however, 
requires a recognition of the largely autonomous nature of money, a theory of the relationship of money to income 
and of money to other assets, and an understanding of the economy's attempts to maintain these relationships. 

For policy making purposes changes in money provide a more useful autonomous variable than “investment.” 
Money and banking statistics by themselves are useful for policy purposes. The value of complete statistics of 
financial transactions, and ultimately of sector balance sheets, lies in the fact that lendings are acquisitions of finan- 
cial assets for which lenders have a pattern of preferences. The monetary authorities are the largest source of change 

ng from fi but plete fi ial statistics show how preferences for financia! assets translate monetary 
changes into lendings and provide the possibility to observe changes in preferences and their effects. 


Some Reconstruction and Estimation Problems in Historical Statistics. Ernest Rusin, American University and 
Howard University. 


The present purpose of this paper is to indicate certain problems in obtaining estimates of historical statistics, 
with specific reference to U. 8. immigration, emigration and foreign-born data. This discussion is an examination 
of the probl tered in making estimates for these series, going back to 1790, the date of the first U. 8. 
census. Certain official parameter information is available in the form of decennial enumerations of the white popu- 
lation, and for certain periods of interest, of the foreign born white. In obtaining estimates of emigration for the 
period 1850-1907, for example, the census enumerations of the foreign born white prove to be useful parameters. 
A number of mortality tables of the U. S. white population have been derived for certain decades of the 19th cen- 
tury and there are also available English mortality data and life tables of private insurance companies in the U. 8. 
and in England. Information on the fertility of American women throughout the 19th century is also available 
and useful in obtaining residual aggregates of population. These residual aggregates may be regarded as estimates 
of net migration prior to 1850. 

There are also useful auxiliary data on the international flows of goods to the U. 8S. from Europe, maritime 
developments and records, and the official statistics of migration (to and from the U. 8.) in some foreign countries 
in the 19th century. Of special interest and value are the nationality studies of the principal migrant groups, both 
in the U. 8. and abroad. The problem of constructing these historical statistics becomes one of synthesizing a wide 
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variety of materials (official and private), of preparing correlations, of employing sampling methods when relevant, 
and of utilizing business cycle information. 


The Pattern of Local Government Finances in the Cleveland Metropolitan Area. Seymour Sacks, Cleveland Metro- 
politan Service Comm. and Rensselaer Polytechnic Institute. 


This paper presents the results of a comprehensive study of local government finances in the Cleveland Metro- 
politan Area for 1956 and some earlier years. The immediate focus of this paper is on the pattern of finances, par- 
ticularly expenditures within the area, and reasons for such a pattern. A secondary focus of this paper, but one per- 
haps of greater importance to students of regional and metropolitan problems, is on changes in the pattern of 
finances. Where the emphasis in the former case is on per capita expenditures, in the latter case the emphasis is on 
changes in the level of expenditures. One of the weakest links in our knowledge of metropolitan change involves the 
role of local governments, and in the fields of public finance and local government the factors determining that role. 
This paper is designed to throw light on these problems in terms of a specific metropolitan area at a certain time. 
But it is hoped that the results will provide hypotheses for the study of other metropolitan areas individually, 
and in toto as well. 

How does the actual pattern of per capita (and per student) expenditures accord with the hypothesis that 
increasing size and increasing costs are concomitant with each other? Statistical and mapping evidence are intro- 
duced to show serious fiaws in the hypothesis. (1) The hypothesis is not complete since it leaves out the single most 
important function, education; (2) the hypothesis does not take into account the possibilities of economies arising 
from size or from specialization of function; (3) the hypothesis takes the municipality out of its proper context, i.e., 
the metropolitan area. The use of multiple correlation analysis provides a powerfu! tool for analyzing the sources of 
variation in per capita expenditures. Three variables were chosen for the multiple correlation analysis; population, 
per capita assessed valuation; per capita wealth. Municipalities were grouped according to political and economic 
classifications. 

Multiple correlation analysis is used to explain changes in the level of expenditures during a period of great 
metropolitan growth. The pattern of expenditures is compared to the pattern of tax rates. 

Coefficients are used in explaining the relative importance of changes in population, changes in assessed 
valuation, and changes in wealth, relative to changes in the expenditure pattern. 


Interim Report on the Study of Future Fertility of Two--Child Families in Metropolitan America. Puinip C. Saat, 
Rosert G. Porrer, Jrz., Coartes F. Westorr, Princeton University. 


The paper summarizes three aspects of a current longitudinal study of the future fertility of two-child couples 
living in metropolitan areas. The three aspects include the methodology of the study, some selected findings, and 
the next phase of the study. The section on substantive findings touches on hypotheses relating (1) religion to fer- 
tility desires (2) fertility desires to measures of acceptance of mother role (3) fertility desires and spacing control to 
personality characteristics and (4) religion to fertility control. 


Development of Statistics Relating to Research and Development Activities in Private Industry. Kennera P. 
Sanow, National Science Foundation. 


Several time series on industrial research and development activities have been prepared prior to the National 

« ed surveys. The two principal series cover the periods 1920-40 and 1941-52. These 

two series suffer from ‘statistical limitations inherent in their basic data source and cannot be linked to each other 
because of varying statistical techniques applied to personnel and cost data. 

The first detailed comprehensive survey of industrial research and development covering the years 1953-54, 
was sponsored by the National Science Foundation and conducted by the Bureau of Labor Statistics. A somewhat 
similar survey covering the years 1955-56 was recently completed. The 1957 survey of industrial research and de- 
velopment expenditures was conducted by the Census Bureau in order that other ic data collected by the 
Census Bureau could be related to research and development expenditures. Succeeding industrial surveys will be 
conducted on an annual basis. 

A number of interrelated accounting, definitional and response problems still plague the present NSF sponsored 
surveys. Several programs are underway within the National Science Foundation to amplify and supplement these 
overall studies. A few suggestions are made for additional areas where further investigation would prove fruitful. 


Sani 


Problems of Allocation. Maurice W. Sasrent, Case Institute of Technology. 


Problems of allocation arise whenever there are a number of activities to be performed and limited resource8 
prevent our performing each separate task in the most effective way conceivable. In these circumstances we have 
to decide how much of each resource should be allotted to each task. The solution of this problem falls into two parts. 
First we have to express the payoff and the constraints on the use of resources in terms of the allocations to be made. 
Secondly we have to perform mathematical analysis to determine the set of allocations which maximize the payoff 
function, while still satisfying the constraints. This paper is concerned with the second of these. We will review 
briefly three mathematical techniques—linear programming, convex programming and dynamic programming. 
Some examples of each are given. 


Income Change: An Adjustment for Extended Time Series. LLoyp Savitz, Duke University. 


Normally we adjust time series for seasonal, cyclical, population, and price changes. Each of these corrections 
tends to make the information more understandable to the reader. One further adjustment in time series to take 
account of shifts in income might be attempted . For example, a tax of one dollar in 1813 differed very little in value 
from a tax of one dollar in 1951, for the price index at both times was about the same. As a percentage of per capita 
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income, however, the tax in 1813 was roughly five times as great as the one in 1951. Modification of the values in 
the time series to reflect this additional change may help in the comprehension of its significance. 

An effort is made therefore, to develop an index of real per capita income. Using conveniently available infor- 
mation, such an index is presented, the use of the completed index is illustrated by an application to an historical 
series, and the results of the trial are compared with collateral information in an effort to assess the validity of the 
technique. Finally, some conceptual problems associated with an index of real per capita i are id 


Theories and Methods of Differentiating Urban Socia! and Demographic Areas. CaLvin F. Scumip anp Santo F. 
Camitueri, University of Washington, Eante H. MacCannetu, San Diego State College, anp Maurice D. 
Van Arspo., Jr., University of Southern California. 


This paper evaluates quantitative typological procedures for delimiting urgan social and demographic areas 
from census tract data. The approach to typology of this reaserch involves a quantification of type concepts used 
in earlier social science investigations. Census tract types are constructed by delimiting an attribute space of census 
tract measures which represent points or dimensions in such a space. Dimensions of the attribute space are reduced 
to enable the substitution of fewer dimensions for the more numerous original dimensions. Graduations on indexes 
of the new dimensions are used as cutting points for delimiting social area types as based on clusters of census tracts 
with similar score patterns. 

The following quantitative typological methods are discussed: (1) ranking systems, (2) cluster analysis pro- 
cedures and (3) systems based on factor analysis techniques. Evidence is presented concerning the empirical 
generality of census tract indexes and types. The utility of social and demographic areas for market research and 
urban planning is discussed. The effectiveness of quantitative typologies for delimiting “natural” subareas of cities 
is lustrated. Statistical evidence is introduced in reference to the effectiveness of typological procedures for pre- 
dicting the occurrence of other and independently defined census tract characteristics. 


Problems of Definition, Concept, and Interpretation of Research and Development Statistics. Wittis H. SaHap.ey, 
Bureau of the Budget. 


The paper considers a number of general problems of definition, concept, and interpretation of research and 
develop t statistics as exemplified by the problems of budgetary statistics on the military research and develop- 
ment programs. A brief historical resume will indicate the evolution of the pts that have been used and high- 
light a number of changes which have had major effects on the totals reported. The present composition of the total 
military research and development budget will be analyzed to illustrate some of the conceptual and practical prob- 
lems involved in defining the concept of “research and development” which have significant effects on the resulting 

tatistics. C ptual and practical probl of some of the different types of classifications of research and develop- 
ment statistics desired will also be discussed. Finally, consideration will be given to the meaning and interpretation 
of budgetary and other financial statistics on research and development and some conclusions offered as to their 
relevance to policy decisions. 


New Measures of Recession and Recovery. Junius Saiskin, Bureau of the Census. 


The statistical reporting system used to follow the course of the 1957-58 recession included several new features: 

1. Successive monthly comparisons of the rates of decline in more than 300 economic series from the peak of 
the 1957-58 recession with rates of declines over corresponding periods of earlier recessions. 

2. A monthly diffusion index for about 200 leading series, ther for about 100 roughly coinciding series, and 
a third for all 300 series. Ten separate diffusion indexes were also computed for industry, geographic area, and eco- 
nomic process groupings of these series. 

3. Direction-of-change tables for each of these 300 series, grouped into 10 major classes of economic activities. 

4. The development of a reasonably inexpensive technique for computing weekly seasonal adjustment factors 
and the application of such factors to a few weekly series. 

The statistical techniques that were used were based on the work of business-cycle analysts, especially at the 
National Bureau of Economic Research, over the last 50 years. The summarization and analysis of 300 series was 
accomplished in a few days and the results were made available only one day after the last group of series was re- 
ceived—on the 15th, 16th or 17th of the month following that covered by the statistics. These improvements were 
made possible by new computer programs. As these studies shift from recession to recovery, greater emphasis is 
being placed upon price series, so that inflationary trends can be more fully observed. It is also expected that addi- 
tional groups of leading series will be added and that greater use will be made of seasonally adjusted weekly series. 
Furthermore, it is planned to put many of the rates of change on a per month basis to facilitate different kinds of 
business-cycle comparisons. 


What is New in Our Eighteenth Decennial Census of the Population? Henry S. SHryock, Jr., Bureau of the Census. 


As compared with the many and considerable changes that are planned for the 1960 Census in methods of data 
collecting and of office processing, the changes in the tent of the population schedule are few. The net result will 
be a slightly larger number of items. The major innovation is a question on place of work, coupled with a question 
on the principal means of transportation used in getting from home to work. Some additional information on 
population mobility and education will be coll d. Obtaining the name of their company, business, or other em- 
ployer should lead to increased accuracy in classifications of workers by industry. Changes in ways of asking the 
questions should also yield more accurate data on several other characteristics, such as age and employment status. 

A few changes will be made in the system of statistical areas used in the publications, perhaps the most notabie 
being the extension of census county divisions from one State to 17. Plans for the publication program are still being 
drawn up. There will certainly be a greater emphasis on statistics for small areas, however. Finally, there will be a 


Jah + 


fairly ev jon program including another Post Enumeration Survey. We shall not only be actively 
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concerned witi the measurement of the accuracy of coverage but also with ways of improving coverage during the 
canvass. 


Operations Research, Models and Data Organization. Martin Suusix, General Electric Company. 


Recent interest in the application of scientific methods to the ing of busi and other organizations has 
resulted in an awareness of the business as a whole, as an information processing and action system. 

In view of the complexity and variability of economic, business and other organizational phenomena, there are 
three major areas of work with which the statistician, operations researcher or behavioral scientist must deal. They 
are (1) model building, (2) data organization, and (3) ocmputing machine methods. This is by no means a neat 
trichotomy; the areas overlap and merge and are of varying importance in different problems. In this paper a few 
items are discussed to illustrate some of our problems. 

The interest in operations research and in the application of scientific methods to management problems, has 
resulted in a new awareness that the so-called aggregation problem faced so often is the general problem of model 
building and data organization for a purpose. The pr ing and valuation of information are now becoming fully 
recognized as an integrated and productive part of industrial and other organizations. 


Demographic Aspects of Military Statistics. Jacos S. Siegen anD Meyer Zirrer, Bureau of the Census. 


This paper focuses attention on military personnel statistics as a body of basic statistics important to the field 
of demographic studies. Our Armed Forces, once of negligible size, have become fairly numerous since about 1941 as 
a result of World War II and the ensuing period of sustained “cold war.” C quently, ideration of statistics 
on the military population enters into many problems with which demographic statisticians are concerned. Illustra- 
tions are given in the area of census enumeration, analysis of local population changes, the preparation of population 
estimates for states and local areas, etc. The effect of the military population on various fields of population study 
(e.g., Vital statistics, migration, and labor force) are noted. The paper summarizes the principal sources of data on 
the size, distribution, and demographic social and economic characteristics of military population. The nature and 
characteristics of the various reporting systems producing basic demographic data for this group (U. 8. Census of 
Population, Selective Service System, Coast Guard, Navy, Marine Corps, Army and Air Force) are described. 
Finally, discussion is given of the coordination comparability, and consistency of data from the various reporting 
systems. 


A Sample of Developments from Sampling Projects of the U. S. National Health Survey. Watt R. Simmons, 
Public Health Service. 


The U.S. National Health Survey r tly has pleted its first year of operations. The program of the Survey 
as authorized by Congress consists of three related but separable major aspects. The first is a continuing nationwide 
Health Household-Interview Survey. The second is a series of special studies designed to secure types of data not 
readily obtainable from household interviews, but which complement the latter. The third aspect of the program, 
specifically recognized in the legislation, is experimental and exploratory work on methodology for collecting data 
on illness, accidents, injuries, impairments, and associated health conditions. 

The paper presents a selection of developments in each of these three phases of activity. It includes a “sample” 
of substantive findings from the household survey in areas of acute illness, injuries, disability, and use of medical 
and dental facilities. A similar selection is offered of developments in the special studies phase with general emphasis 
on pilot projects related to an intended Health Examination of a sample of the population. 

Sampling of program activity is concluded with brief descriptions of several technical developments and prob- 
lems. Attention is given here to special features of survey design in the household survey, including experimental 
statistical work associated with maximum exploitation of the conti: h ter of the design, and to steps toward 
evaluation of Survey activities. 


The Econometric Approach to Tax Revenue Estimating. Taomas Lee Smirna, U.S. Treasury. 


Econometric approach peripheral to tax estimating. Treasury uses single-equation and small recursive systems 
to test corporate profits and other items for consistency with general economic forecasts and to check more elaborate 
individual i tax estimates. Price elasticities considered subject to wide margin of error. Corporate profits 
difficult to estimate. 

Treasury frankly empirical, tested accuracy of prediction being decisive. Empiricism defended. Corporate 
profits equation presented is one of several being watched Probably transitory, though has predicted well and met 
other statistical criteria satisfactorily. How well need these criteria be satisfied? 

Business sales represented by private GNP. The “mix” represented by value of industrial production and by 
motor vehicle factor sales. 

Labor cost reflected by two factors: compensation of nongovernment employees, and employer contributions 
to private pension and welfare funds. Their regression coefficients accurately account for corporate employee 
compensation. 

Remaining factors are the corporate IVA, the “statistical discrepancy,” and the effective compeaate tax rate. 

Accurate predictions not obtained when depreciation introduced. 


Family Planning in Medical Practice. Sypney 8S. Sprvack, Columbia University. 


This paper reports on a study of physicians’ attitudes and behavior in advising contraception or sterilization. 
The study is being carried on at the Bureau of Appligd Social Research, Columbia University, with principal sup- 
port ing from Population Council, Inc. 

This is the first detailed study of the physician's role as adviser on contraception and family limitation. It is 
based on interviews with 551 physicians (Obstetricians-Gynecologists, Internists, and General Practitioners), in 
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6 different localities. (Two of these were large cities, two small-town areas, and two, clusters of rural counties. In 
the large cities, some sampling was used; here all physicians in tice were contacted.) 

The study is concerned not only with extent and elreematences of approval of family limitation and contra- 
ception, but also the degree to which physicians are involved or active in advising on contraception and willing to 
take the initiative in broaching the subject. This paper discusses the formation of an Index of Involvement (in 
advising on family planning and contraception), and the relationship of such involvement to other variables, such 
as the doctor's religion and that of his patients, his age, specialty , and professional status. 


Some Suggested Aids to Teaching Statistics. R. Clay Sprow.s, University of California, Los Angeles. 


This paper discusses three different aids which the author has found useful in teaching statistics. In order of 
increasing importance, these are a mechanical coin-flipper, a general answer sheet for problems of testing hypotheses, 
and a high-speed electronic computer. 

The mechanical coin-flipper is an alternative to bead boxes, coins, and dice, with the advantage of drama, 
noise, and versatility. 

A genera! answer sheet for testing hypotheses has been devised which begins with questions about the hypothe- 
ses, the two types of error, and the costs of making these errors, and ends with the decision rule and the operating 
characteristic curve. Only those parts which deal with the probability distribution of the random variable need be 
changed in order to use this same general form for problems of percentages, means, correlation coefficients, etc. The 
value in this approach is the unity given to seemingly different statistical methods. 

Finally, the high speed electronic puter is idered a tool for teaching statistics. The point of view is 
taken that students should become as casual in their approach to a high-speed computer for the solution of statistica: 
problems as they presently are in their approach to slide rules and the laboratory equipped with desk calculators. 


Teaming up Censuses and Samples. Frepericx F. Srepsan, Princeton University. 


The use of sampling in conjunction with censuses, inventories, and other “complete coverages” is increasing: 
The combination has distinct advantages in addition to economy and making more data available to meet increasing 
demands. It enlarges the domain of choice in the design of statistical tions, makes possible better field operating 
procedures, and provides supplementary information about the quality of the data. To gain these advantages, stat- 
isticians must solve new problems in managing field operations, designing samples, processing data, and combining 
estimates derived from different systems. In these problems it becomes necessary to specify the needs of the ultimate 
consumers of the data in greater detail and with greater accuracy. Complex judgments of the value of the informa- 
tion are also required. Moreover, the designing and other preparations often must be completed sooner than in 
simpler operations. Some examples of the problems and progress in the Population Census display the importance 
of these developments. 


A National Accounting System for Measuring the Inter-Sectoral Flows of Research and Development Funds in the 
United States. Hensert E. Strainer, Johns Hopkins University. 


The focus of this paper is the problem of ascertaining who actually does r ih and development in the United 
States, and who provides the funds to do this research and development. In the development of the paper, an ac- 
counting system, in the form of a matrix, is designed which relates the flow of R&D funds within and between the 
various major sectors of the economy. 

Ours is increasingly a science-oriented society. A table of accounts which measures and traces the flows of R&D 
funds through the economy is of great present interest and will become of increasing importance to us in the future. 
Such an accounting system is vital to an understanding of the relationship between R&D, technological innovation, 
economic growth and the groups or sectors in the economy which affect, , stimulate, or do R&D. With such a table 
of accounts shifts in the relationships can be spotted and necessary actions, if needed, can be pressed for either 
by private or governmental agencies. 

One of the major conclusions of this paper is that steps should be taken in the near future to innovate such a 
system of accounts by an appropriate federal agency. The precedent of the Nationa) Science Foundation Surveys of 
R&D would point to the real possibility of introducing such a system within the next five or six years. 


What Will the 1960 Censuses Do? Conrap Tazuser, Bureau of the Census. 


The 1960 Censuses of Population and Unemployment, Housing (including Utilities and Equipment), and 
Agriculture, Irrigation and Drainage will provide a comprehensive inventory of the Nation's resources in these 
fields. Steps have been taken to speed release of data by speeding up enumeration, collecting and tabulating a sig- 
nificant proportion of the data from a 25 per cent sample, and by the use of high speed electronic computers for the 
tabulation. Users of small area data will find the number of tracts approximately doubled since 1950. The substitu- 
tion of stable census county divisions for unstable minor civil divisions in 16 States will make data for such areas 
more useful. Improvements in coverage and quality are being sought primarily through greater use of the best in- 
formed (rather than the most ible ) respondent, and through greater use of information available from local 
sources to identify persons who might otherwise bave been omitted. Some items have been dropped because no 
longer useful—a few questions will appear for the first time. Sample surveys will aid in providing comparability with 
past series where changes were needed. Quality control procedures will be used throughout. Work has been started 
on a Monograph Program to supplement the census reports. 
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Stratified Sampling from the Negative Exponential Population. Josep V. Tatacxo, Marquette University. 


The widespread application of statistical sampling techniques in the field of auditing, accounting, inventory 
and management brings statisticians several theoretical and practical problems. 

The requirements to provide unbiased estimates of mean values and aggregates, with an indication of the possi- 
ble sampling error and confidence intervals are fulfilled by using classical stratified random sampling. To find the 
necessary estimates may be a too cumbersome arithmetic process and we have to adjust the standard methods. If 
the population is approximately of the negative exponential type, as in most of the named fields of applications, 
we have oheerved some advantages of stratified random sampling: 

S..... 1 random sampling from the population which does conform to a probability model 


Siz) = 2>0, 


has these properties: 
1. The unbiased estimate of @ is 


0 = 
2. For populations with means interval 
0.5 <6-1 < 20, 


the strata intervals may be kept constant and the length (approximately) 
a°,a,a%+++a", where aj =a* = — 
3. In such cases, the population variances are estimated for every strata 


4. The variance: Var (Zs) and Var (NZ,:) are then functions of population variances with sample and strata 
sizes. 

5. The optimum allocations of total sample may be tabulated for constant !ength of strata and particular 
means. 


The Problem from the Standpoint of the Scientist and Technician. Merxiam H. Tayrren, National Research 

Council. 

The question of whether there is a “shortage” of personnel of any category in our society is only in part, and 
perhaps a minor part, a statistical question, It involves the difficult matter of definition. It involves value judgments, 
in that the goals and needs of our society are involved. It involves the question of quality or productivity. It also 
involves an understanding of the basic dynamism of our civilization in the modern world. The instantaneous balance 
between job-seekers and job openings can only be interpreted in the light of these more basic factors. 


Reliability Functionalism. Joun P. Tuaaie anp Leonarp Rapo, Weetinghouse Electric Corp. 


At the present time, a great deal of effort in the reliability field is being devoted to obtaining the reliability of 
components under varying conditions of environment and usage. The component reliabilities obtained are, in many 
instances, then used to predict the reliability of high cost item assemblies. A wealth of data (not always easily ob- 
tainable) is being accumulated on existing components. Whether the gap, between the known and unknown relative 
to component reliability, is decreasing or increasing is a debatable issue; since new components and new applications 
for the componente continue to appear. A means to narrow this gap is available, and consists of evaluating the per- 
formance of each component with respect to the effect of the basic elements (material, mechanical configurations, 
etc.) upon that performance. This may be done in many inst by suitable test design and analysis. Once the 
effects of the basic elements upon a component's reliability is understood, this information may be used to predict 
the reliability of yet untested components by methods analogous to the prediction of reliability for high cost items. 
The procedure mentioned would supplement rather than replace reliability methods now in common use. The initial 
cost and paper work of such a program would, of course, be high, but eventually, such a program will pay dividends 
in the form of increased component and assembly reliability at less cost. 


The Influence of the Number System on Estimation. Srantuy H. Turner, University of Pennsylvania. 


This paper states thst the number system has an influence on the reporting of numerical information when the 
information is not exactly known but only estimated. Numbers which are multiples of the divisors of the base used 
are said to be more frequently reported than non-divisors. Data are analyzed from two sources. The first is age 
reporting in the U. S. census for the last 80 years. The second is a series of estimates of the numbers of dots on cards. 
The predicted rank order of ending digits of age correlates very highly with the observed rank order. 


Factors in the Rising Cost of Living. Anyness Joy Wickens, U.S. Department of Labor. 


The rise in the Consumer’s Price Index sinve 1939 is the product of many complex forces, with first one and 
then another influence predominating. There is no one devil in this piece. During the war, despite credit inflation 
consumer prices were held fairly closely except for the initial pre-controle increases in foods, clothing and rent. After 
the war, too-early removal of rationing and of price and wage controls, fed by long-accumulated consumer demands 
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backed by savings, brought a very rapid price rise, in which all elements in consumer prices participated. Rents 
lagged as controls continued in some cities but have since caught up steadily. Charges for services currently 27% of 
the Index have risen continuously since the war’s end and the Consumer Price Index has declined or been stable only 
when commodity pnce changes have been sufficient to offset rising service charges. Contributing to these price in- 
creases have been higher wages, transportation costs, overhead costs, as well as a high and sustained level of con- 
sumer incomes. Differential rates of increase and of responsiveness to business deciines are also discussed. 


Measurements Made by Matching with Known Standards. W. J. Youpen, National Bureau of Standards. 


Quick estimates of an unknown quantity may be made by matching it against a series of prepared standards 
with known values. Information is lost if the interval separating the prepared standards is very large. If the interval 
is small, more standards will be required and the operator may take more time to come to a decision. 

The matching procedure can be considered as a more or less drastic rounding off procedure applied to data 
with a given standard deviation. A curve has been prepared that shows how the proportion of times an operator 
successfully picks the standard closest to the unknown depends upon the interval between the standards. If known 
materials are available , an estimate of the standard deviation of the matching procedure can be made using the 
proportion of correct matches. A second curve shows how the proportion of times two independent operators agree 
in the standard selected depen:'s upon the interval between the standards. The proportion of unknowns on which 
two operators agree can also be used to estimate the standard deviation. Intervals less than 1.50 are unnecessary. 
Intervals greater than 3.00 result in a substantial loss of information. 


The Statistical Foundation for Policy Formation in the Federal Reserve System. Raps A. Youna, Federal Reserve 
Board. 


The content of the statistical program of the Federal Reserve System is controlled by the necessities of central 
banking policy formation; is guided by an electric administrative philosophy; and poses special problems of com- 
munication of findings. To determine monetary policy the facts as to recent economic events are measured statisti- 
cally; statistics are also used to test the effects of policy actions after they are taken. The statistical program of the 
Federal Reserve, while devoted primarily to current requirements, also engages actively in the development of better 
statistical information in the future. Central banking policy formation often pushes Federal Reserve statisticians 
towards the margia of for ting. Existing techniques are still inadequate so this push is generally resisted. In cer- 
tain cases, however, forecasting short-term developments is undertaken. 

Federal Reserve research tradition is one of strong emphasis on empirical observation of institutional processes 
and on an eclectic theoretical approach. Considerable diversity of theoretic and analytic patterns has not only been 
accepted but encouraged. A permissive atmosphere is encouraged in research administration. While formal statistical 
techniques are employed by the Federal Reserve, they are supplemented by direct access to the opinions of informed 
persons in the trade. Electronic computation is being used to support the first kind of work: travel and interview the 
second. 

Central banking policy requires communication of results to many persons; not only the policy heads of the 
system, but to other Government agencies, to money market specialists, to academicians, and to the public. The 
System's policy has been to disclose the factors involved in policy formation as fully as is consistent with its responsi- 
bilities for credit action. 


Factorial Experiments in Life Testing. Marvin ZELEN, National Bureau of Standards. 


This paper discusses various aspects of the planning and analysis of factorial experiments when the data are 
obtained ‘rom life tests. Methods are discussed for analyzing life test data which will: (1) enable one to evaluate the 
relative importance of each factor with respect to its influence in causing failure of the components under test; 
(2) suggest necessary criteria to make meaningful interpolation to normal operating conditions from accelerated 
tests at extreme environmental conditions; and (3) permit straightforward analysis of the data. 


Optimum Maintenance Policies. J. A. ZorLuNeR, General Electric Co. 


A model is developed to evaluate the effects of 1) the length of time between inspections, 2) the average length 
of repair time, 3) the average length of inspection time, 4) the chance that equipment failure will occur in the opera- 
tion of turning it “off” (storage condition) and turning it “on,” 5) the mean-time-to-failure in the “on” condition, 
and 6) the mean-time-to-failure in the “on” position upon the operability of an equip t licati 

Operability is defined as the probability that an equipment will be found operable upon issuance of a command 
which occurs with constant probability density over time. Only the case where the distribution of times to failure 
follow the exponential distribution is considered. Utilizing the model derived to calculate the value of Operability 
for given values of the variables, optimum maintenance policies are derived for typical situations. Letting p™ be 
the probability that the equipment will be found inoperable at the ith inspections stage it is shown that the p“ 
converges to a constant p™) representing the steady state condition. This illustrates the fact that a given frequency 
of inspection will maintain a specific level of operability regardless of initial conditions. The case where no repairs 
are made but complete spare equipments are stocked is also presented. 


The Role of Treatment Error in Comparative Experiments. Grorar ZysKinp, Iowa State College and the University 
of North Carolina, anv Oscar Kemptuorne, Jowa State College. 


From among the possible types of errors that may arise in the performance of experiments it is useful to dis- 
tinguish the experimental unit error, the treatment error, the measurement error, and the interactive errors of com- 
binations of the above Some concrete illustrative examples are given. The analysis of variance tables together with 
the corresponding sets of expected values of the mean squares have been worked out for a representative number of 
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such experimental schemes. These expectations indicate that, as in the case of interaction of treatments with experi- 
mental material, the presence of interactions of treatment sublevels with main levels and with experimental material 
introduces usually a negative bias in error terms of analyses of variance comparisons for treatment effects and de- 
creases the sensitivity of tests. Whether this is a serious defect or not depends of course on the relative magnitude 
of these variances and/or on the physical nature of the experiment at hand. Repetition of attempts at the various 
treatment levels will usually allow the separation of these two variances, will decrease the amount of bias involved 
in the test of significance for the equality of the treatments, and with some schemes will also allow the performance 
of a significance test for treatment sublevels. 

Formulas of the variance of specific treatment differences and of the average treat: t diff show that 
in the presence of interaction these variances are in most cases overestimated by the commonly used “appropriate” 
error mean square. Under general circumstances completely unbiased estimation of these variances is usually not 
possible. 

In view of the fact that the value of knowledge about the presence of main effects is somewhat questionable 
when the overall effect of non-additivities is appreciable it cannot be urged too strongly that the experimenter try 
to work on a scale which, as far as possible, is additive. The method of the maximum F-ratio can tentatively be sug- 
gested as a guide for finding an additive scale. 


BOOK REVIEWS 


Measurement and Statistics: A Basic Text Emphasizing Behavioral Science Applications. 
Virginia L. Senders. New York: Oxford University Press, 1958. Pp. xvi, 594. $6.00. 


Joun P. Gitpert, Center for Advanced Study in the Behavioral Sciences 


_ book, written by a psychologist for psychology students, gives no indication 
of having been refereed by a statistician. It has many statements to which most 
statisticians would take exception. A few examples appear in the course of this re- 
view. Apart from statistical inaccuracies, however, the book has a unique plan which 
deserves comment. 

According to the dust jacket, “The author’s purpose in writing this book has been 
to present, in an elementary text, the concepts of statistics within the framework of 
measurement theory.” The measurement theory referred to is that of S. 8. Stevens 
and, as presented in this book, it consists of classifying measurement into four scales, 
nominal, ordinal, interval, and ratio. The four scales correspond roughly to being 
able to classify, order, measure from an arbitrary zero, and measure from a naturally 
determined origin respectively. The author presents this classification early in the 
book and then proceeds scale by scale to discuss the descriptive statistics appropriate 
to each. This discussion is rather extensive, including a chapter on transformations, 
and more than half the book, precedes the chapter on probability. The rest of the book 
is devoted to statistical inference, beginning with a general chapter on testing and 
estimation and then taking up statistics appropriate to inference from each scale 
separately. 

There are two points which should be made about this plan. The first is that the 
student is introduced to the restrictions on his calculations before he is introduced to 
the reasons for wanting to make them. The second is that it might be easier for the 
student if the scales were introduced in the opposite order, since he will have had some 
experience with the ratio scale and can be shown how this is successively weakened to 
give the others. 

The above comments have been directed to the general plan and aims of the book. 
The following remarks are directed to the question, “Given that we agree with these 
aims and are willing to put this much effort into presenting statistics in this frame- 
work, is this a good statistics text?” The answer to this question is, unfortunately, 
“No.” The reason for this verdict is the inadequate and inaccurate presentation of 
basic statistical concepts. 

The major deficiency is the almost complete disregard of the problem of sampling. 
More than half the book deals with descriptive measures when the entire population 
is considered known. However, it is an interesting facet of descriptive statistics that 
one purpose of description is to convey an idea of what would happen if one were 
to sample the population. Thus the correlation coefficient is developed around the 
idea of predicting one variable Y from another X, but this only makes sense if we are 
picking a pair (X, Y) at random from all pairs (X, Y), which have a given value of X. 

This failure to introduce sampling early in the book raises some interesting prob- 
lems. Thus, in discussing the normal distribution (p. 202) in the section on descriptive 
statistics, we find that the normal distribution has two parameters when expressed in 
terms of proportions, and three when expressed in terms of frequencies (the third be- 
ing N). 

The only discussion of sampling occurs on pages 367, -8, and -9. This discussion 
contains the following two statements: “It is extremely important that the sample be 
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representative of the population about which inferences are to be made,” and “A 
much better way of selecting a representative sample is by random sampling. A sam- 
ple is called random if every member of the population has the same chance of being 
included in the sample.” The fact that the whole logic of statistical inference rests 
upon the sampling procedure is not mentioned. Independence of the observations 
comprising a sample is mentioned only once (on p. 426, in the derivation of a confi- 
dence interval for the median), and the term “Independent Random Sample” does not 
occur. 

The following three quotations serve to illustrate the inaccuracies. (The reviewer 
has sent the author a more extensive list.) On p. 381 in discussing confidence intervals 
we find, “The values 90 and 110 are called confidence limits. We make the statement 
more definite by saying, ‘The probability that the population mean lies between 90 
and 110 is .95.’” The distinction between confidence and probability is further con- 
fused in the next chapter (p. 391), when at the end of an example illustrating confi- 
dence limits for a proportion the author says, “Putting these two results together, we 
can state that we are 95 per cent confident that the population proportion lies be- 
tween .17 and .35. It also follows that if we were to draw many more samples from 
the same population, we should expect 95 per cent of them to contain between 17 and 
35 men in need of haircuts.” On p. 380 we find, “The mean and the median are exam- 
ples of unbiased estimators.” 

From recent volumes of this Journal it is evident that the henet over inaccurate 
introductory texts is common and the problem arises as to how the statistical profes- 
sion is going to live with the scientists these books produce. The only suggestion this 
reviewer can make is that perhaps the American Statistical Association can offer to 
provide publishers some sort of refereeing service. It would certainly be more re- 
warding to comment on a book before it is published than afterwards. 


Statistics Essential for Police Efficiency. John I. Griffin. Springfield, Ml: Charles C Thomas 
Publisher. Pp. xv, 229. $7.50. 


Hans Zetsev, University of Chicago 


HIs is a valiant effort to solve what is probably an insoluble problem: to tell a 
highly specialized audience all about its particular statistical problems, and con- 
vey in addition the standard content of an elementary statistical text, from measures 
of central tendency to time series analysis. Thus, the book has 78 chapters in all: 
add tables on squares and reciprocals, areas under the normal curve, random num- 
bers—and how many pages do you need? This book has 224 to do the job. This self- 
imposed brevity has great disadvantages. The precision resulting from it can be ap- 
preciated only by a statistician, who knows the stuff and therefore does not need the 
book. It is hard to see how a beginner could use this text without help. The text grew 
out of a course which the author gives at the City College of New York, and to judge 
from the book, it must be a very good course. But my prefereace would still be: to 
have specialized texts which deal in great detail with the conceptual and other spe- 
cialized problems of a particular field, and supplement them by a good general sta- 
tistical text even if the examples used come from a variety of fields. To write, for 
instance, about measures of association once for educators, once for police officers, 
and once for business men seems a waste. On the other hand, police officers, need, for 
instance, a whole course on the proper reporting of the incidences of crime. 
But if this volume, written for police officers, has a better chance to reach them 
than a general text, and if it encourages courses similar to the one in New York, more 
power to it; it will do a useful and badly needed job. 


: 
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Experimental Design in Psychology and the Medical Sciences. A. EZ. Maxwell. London: 
Methuen and Company, Ltd.; New York: John Wiley and Sons, Inc.; 1958. Pp. 147. 
$3.75. 


A. M. Feveruerm, Kansas State University 


HE author, who is a Lecturer in Statistics in the Institute of Psychiatry (Mauds- 

ley Hospital), University of London, states that this book “was begun at the re- 
quest of my clinical colleagues who were concerned with the design of quick non- 
elaborate experiments to answer everyday problems in their clinical work, and to test 
hypotheses about individual cases.” In order to reach this particular audience the 
author avoids mathematical terminology and uses worked examples (showing de- 
tailed computational procedures) as a meens of illustrating the use of various experi- 
mental designs and the procedures for analysis of the data and interpretation of the 
results which arise from the conduct of experiments. 

This book should appeal to individuals who have some feeling for the design of 
experiments and wish to know what added requirements are necessary in designing 
their experiments in order to use statistical procedures for making tests of signifi- 
cance. As for classroom use, the book not only lacks problems, but it also lacks the 
breadth and depth of coverage necessary to make it suitable as a textbook in experi- 
mental designs. Its title may belie the limited coverage of its content. 

Chap. I contains some basic concepts of tests of significance such as: standard error 
of a mean, analysis of variance table, t and F tests. In Chaps. II through VI the author 
discusses the use of randomized blocks designs, latin square and graeco-latin square 
designs, factorial experiments, cross-over designs, and balanced incomplete block 
designs. 

Chaps. VII and VIII exhibit the computational procedures involved in the use of 
linear regression and correlation, and the analysis of covariance. Chap. IX presents 
an example of the use of a recent design of D. R. Cox for the case where “certain 
treatment arrangements are inadmissible.” Chap. X presents an example of the use 
of a systematic design, also due to D. R. Cox. Chap. XI is on the subject of the 
relative efficiency of different designs. The appendix contains tables for the distribu- 
tion of t, F, and the ratio of s%nax. tO 8*min. 

The brevity of this book does not allow for depth in the discussion of the “finer 

points” to be considered in making tests of significance and interpreting the results. 
The reader who is unaware of the subleties may get the impression that the interpre- 
tation, like the computation, may be done in a mechanical fashion. A few examples 
will illustrate this latter remark: 
(1) Null hypotheses are either “rejected” or “accepted” on the basis of a significance 
level (Type I error) alone. No mention is made of the Type II error. (2) The use of a 
one-tailed ¢ test (p. 53) to answer the question, “Is C, who can write, but has the 
lowest score among the controls, significantly faster than A?” is highly questionable. 
(3) There is no discussion of fixed and random effects (Model I and II of Eisenhart) 
in relation to the analysis of variance and appropriate denominators for the F-test. 

In spite of these shortcomings, those individuals embarking on research in which 
the human being is the experimental unit will find some helpful hints for designing 
their experiments. The detailed ealculational procedures outlined in this book will 
be helpful to the researcher who must do his own computation. 
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Public Administration and the Public—Perspectives Toward Government in a Metropoli- 
tan Community. Michigan Governmental Studies No. 36. Morris Janowitz, Deil Wright 
and William Delany. Ann Arbor: Bureau of Government, Institute of Public Administra- 
tion, University of Michigan, 1958. Pp. 140. $3.00. Paper. 


Lane Davis, State University of Iowa 


HIS monograph presents the reader with an expert and highly original examina- 

tion of the largely unexplored area of individual attitudes toward public authority. 
The authors seek to identify and describe “basic perspectives toward administrative 
authority” (p. 101), to evaluate on the basis of criteria drawn from democratic politi- 
cal theory “the extent to which these public perspectives were creating a political 
climate appropriate for administration based on consent” (p. 101) and thus “to con- 
tribute to understanding the strategic consequences of administrative processes on 
contemporary society and democratic consensus”. (p. 1) Although remarkable for its 
firm theoretical base, the study is aimed at the practicing public administrator as 
well as the academic student of politics. Both in content and manner of presentation, 
the authors demonstrate their desire to produce a study that can be of immediate 
use to the urban administrator. 

Statisticians and students of survey research methods will find littie that is new 
in the technical side of this study. In regard to sample design, interviewing techniques 
and methods of data analysis, the authors have followed the lines previously associ- 
ated with the Detroit Area Study (on the data of which this study is based) and the 
Survey Research Center of the University of Michigan. Appendixes provide brief 
but adequate descriptions of the sampling technique, the social characteristics of the 
sample and the method used to operationalize social class—a major independent vari- 
able in the analysis. Another appendix includes the interview schedule and interview- 
er instructions. 

Data analysis is mainly in the form of cross-tabulations of attitudes and social 
characteristics—in some cases with another variable held constant. Tables are 
generally adequate, though this reviewer found the absence in the tables of any indi- 
cation of statistical significance a minor annoyance. In two cases, short Guttman- 
type attitude scales have been developed. One, ambitiously labelled a measure of 
political self-confidence, is a stripped-down, three-item version of the political efficacy 
scale of The Voter Decides. Precisely what answers to these three simple questions 
mean in relation to the total structure of political values, attitudes and behavior of the 
individual still remains for some future investigator to discover, but for what it is 
worth the scale has the virtue of dividing the sample into three roughly equal groups 
that can be designated low, medium, and high in political self-confidence. 

The other scale is new and rather ingenious. It has been developed to measure the 
extent to which administrative authority is accepted or rejected by members of the 
sample. Responses to open-ended requests for the evaluation of five local and state 
administrative authorities have been content analyzed into categories of acceptance 
and rejection. The result is a scale which, the authors assert, confirms the existence 
of a “generalized perspective toward the public bureaucracy on the basis of adminis- 
trative performance” (p. 32). Perhaps the scale does indicate this, but both the scale 
itself and the discussion in the text on this point are something less than satisfactory. 
Of the five administrative agencies considered, three produce splits between accept- 
ance and rejection which fall outside the 80%-20% limits which is conservatively held 
necessary for a satisfactory item on a short Guttman-type scale. This raises some 
doubts as to whether a reproducibility of .924 is enough to support the statement that 
the scale meets “the statistical requirements of scalability”. Further, in discussing 
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the degree of acceptance of administrative authority, the authors shift from unfavor- 
able evaluations (which are the basis for the rejection category in the scale) to “out- 
spokenly critical” or very unfavorable evaluations without explicit explanation. The 
effect is to confuse the issue and exaggerate the extent to which the scale supports 
the contention of widespread “public acceptance of the performance of metropolitan 
community based agencies” (p. 32). 

With these rather minor qualifications, the analysis is clearly presented with a 
refreshing absence of self-conscious and over-elaborate expertise. The technical 
aspects of the inquiry are, in brief, conventional, competent and relatively unob- 
trusive. They are subordinated to the substance of the study. 

The inquiry into the “public and political context in which public administration 
operates” (p. 5) begins with the common assumption that a democratic political sys- 
tem based on popular consent is desirable. If this is a conventional beginning, the an- 
alysis which follows is not. The problem is the meaning which can be given to “con- 
sent” or, more precisely, the necessary conditions for an administrative process based 
upon “consent”. This is a problem which (as the senior author notes) has somehow 
escaped the attention of both contemporary students of political behavior and demo- 
cratic political theory. 

The first chapter is devoted to a consideration of this problem. Starting from an 
assumption that bureaucratic balance between extremes of despotism (“The despotic 
bureaucracy disregards public preferences and demands. It is likely to resort to 
coercion and manipulation to maintain its power.” [p. 6]) and subservience (“The 
subservient bureaucracy finds itself so concerned with the demands of special inter- 
est groups that it compromises its essential organizational goals and sacrifices essen- 
tial authority.” [p. 6]) is a necessary condition for consensual administration, the 
authors derive four broad perspectives toward public authority. These perspectives, 
knowledges, self-interest, principle mindedness, and prestige, provide both a focus for 
subsequent empirical inquiry and a set of criteria for evaluating empirical findings in 
terms of the prescriptions of American democratic theory. 

Each perspective and the data bearing on it are considered in the four chapters that 
follow. Given the nature of the research design which seeks to build a solid bridge 
from theoretical prescriptions to empirically researchable problems, the senior au- 
thor faces a critical problem in reducing the generalized perspectives to a series of 
specific indicators that have reasonable validity. The problem is explicitly recognized 
and met with considerable if not complete success in the trenchant discussions that 
introduce each of these chapters. Following an analysis of the data, there is a brief 
evaluation of the findings. On the whole, the authors find the present situation only 
mildly encouraging. 

Three of the findings of these chapters deserve brief mention here. The first is the 
rather high degree of consensus toward administrative authority which cuts across 
class, educational, age, and race lines in each of the four areas of investigation. The 
second is the existence of a small (10 to 15 per cent of the sample) group of radical 
dissidents who simultaneously reject existing administrative authority and yet 
demand an expansion of the scope of government. The third is the widespread accept- 
ance of the need for political pull and intermediaries in dealing with government agen- 
cies. Analysis of the latter two phenomena is disappointingly brief and suggests the 
need for further more narrowly focused investigations. 

Chap. 6 examines the relationship between political party identification and the 
presumably representative perspective of self-interest. The findings provide qualified 
support for the hypothesis that attitudes toward administrative authority and parti- 
san politics are relatively independent. Chap. 7 is a rather turgid exploration of the 
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difficulties of the urban public administrator in communicating with the public—a 
matter related to the subject of the monograph but treated in a way which falls out- 
side the theoretical and methodological framework which provides unity to the rest 
of the study. The final chapter provides a summary of the findings and a discussion of 
some of the practical implications for the urban public administrator—a discussion 
which reemphasizes the practical objective of the writers. 

What can be said in criticism? There are a number of negative points which im- 
mediately spring to mind. The specific indicators of respondents’ attitudes which are 
taken as providing an operational meaning for the basic perspectives are, at best, 
sketchy and incomplete. The indicators for knowledge are perhaps least satisfactory. 
There is very little in the study beyond broad and nonspecific description concerning 
the community context of the investigation. With the exception of the study of pres- 
tige where earlier studies exist and can be used as a basis of comparison, the inquiry 
lacks any time dimension. It is, of necessity, a static cross section of Detroit, 1953- 
54. One feels in some cases that the authors have not realized all the possibilities 
inherent in the data at their disposal. With the recent attention which has been 
given to the relation between mobility and political perspectives, the failure to con- 
sider information bearing on mobility which was at the disposal of the authors (judg- 
ing from the interview schedule) is difficult to understand. Although the authors are 
concerned with the relation between partisan political and administrative attitudes, 
there is no attempt to discover the extent to which individuals were interested in or 
active in partisan politics, or to discover membership in, or identification with other 
relevant political action groups beside political parties. 

These are weaknesses, but if we concede (as I think we reasonably must) both the 
limitations of the survey research method and the fact that this is an exploratory 
survey of an area that has been too long neglected, the weaknesses fade into the back- 
ground. The study has many virtues. The striking and ambitious theoretical analysis, 
contrasting so vividly with the shallow and superficial designs which characterize so 
much political research, the empirical findings, and, perhaps most of all, the teasing 
questions which are left for further investigation make this study deserving of the 
most careful attention of any thoughtful student of American politics. 


The Income Tax Burden on Stockholders. Daniel M. Holland. Princeton: Princeton Uni- 
versity Press. 1958. Pp. xxv, 241. $5.00. 


Francis M. Boppy, University of Minnesota 


u1s book is number five of a series of Fiscal Studies under the guidance of the 

Committee on Fiscal Research of the National Bureau of Economic Research. 
The central problem of this study is posed by Holland in his preface as “what is the 
net result [of the imposition of a corporate income tax on corporate earnings and of 
a personal income tax on the distributed earnings] when both distributed and un- 
distributed earnings are considered”? (p. xi) 

The main lines of analysis can be indicated by the two questions: “How heavily, 
compared with other sources of income, have corporate earnings been taxed? How 
heavily, compared with other taxpayers, have stockholders been taxed”? These ques- 
tions may be put in a somewhat different way: “How much greater (or less) was the 
tax liability on the stockholder’s share of corporate earnings and his total income than 
the tax that would have been due had his pro rata portion of net corporate earnings 
been reached fully and promptly by the personal income tax alone”. (p. 5) 

Simply, if the total of corporate earnings is allocated to the stockholders in each 
income bracket in proportion to the dividends received by them, and if the corporate 
taxes on earnings were allocated to the stockholder groups on the same basis, then if 
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for any income group of stockholders, the total of the allocated corporate tax and of 
the personal income tax on the dividends received is greater than the potential per- 
sonal income tax on an increment in their incomes equal to their allocated corporate 
earnings, they are “overtaxed” by the “double taxation.” Conversely, if their potential 
personal income tax is the greater, they are “undertaxed.” 

This simple measure (which Holland calls variant 1) is modified to take into 
account the fact that some part of retained earnings are reflected in increased share 
values, and hence may be reached eventually by the capital gains tax, although not 
immediately by the present personal income tax. If 72 per cent of the retained earn- 
ings result in taxable capital gains to the extent of two-thirds of this 72 per cent, 
spread equally over five years, then the addition of these capital gains taxes is the 
basis for his variant 2 analysis. Going still further, should the other 28 per cent be 
imputed as a “loss” and be taken account of, then he has variant 3. 

The results for 1950 of these analyses are summarized on a chart (p. 52), and show 
a substantial difference in the “cross-over” incomes (the income level at which the 
overtaxation of the lower income groups changes to undertaxation of the higher in- 
come groups) of the three variants. For variant 1 the cross-over income is about 
$80,000, for variant 2 about $100,000, and for variant 3 about $200,000. 

A wide range of additional questions are raised and answered: changes in the dif- 
ferential tax burden from 1940 to 1952; effect of alternative assumptions (such as the 
assumption that half of the corporate income tax is shifted forward instead of all 
resting on the stockholder, or that corporate income be adjusted to a current price 
level basis); the progressivity effect of corporate income taxation; the aggregate 
effect of the differential taxation of stockholders; and a special consideration of the 
relief provisions of the 1954 Internal Revenue Code. 

The basic data for the study were the annual tabulations of the Statistics of 
Income (Internal Revenue Service). In Appendix B., Notes on Methods, Holland 
spells out by discussion and detailed examples the assumptions and computations 
employed to derive from these data the tables and charts that present his analysis of 
the differential income-tax burden on stockholders. 

The result is a book that not only answers the main questions raised in detailed 
and precise terms, but also investigates a wide range of alternative assumptions that 
might be proposed and shows what the impact of these alternatives is on the results 
of the basic model. The book maintains the tradition of careful, competant and de- 
tailed analysis of statistical data bearing on important economic public policy ques- 
tions that has marked so much of the research sponsored by the National Bureau. 


Major Statistical Series of the U. S. Department of Agriculture, How They Are Con- 
structed and Used. United States Department of Agriculture. Agriculture Handbook No. 
118. Washington, D. C.: U. S. Government Printing Office, 1957. Volume 1, Agricultural 
Prices and Parity, Pp. iv, 87, $.50; Volume 2, Agricultural Production and Efficiency, Pp. 
iv, 74, $.45; Volume 3, Gross and Net Farm Income, Pp. iv, 106, $.55; Volume 4, Agricul- 
tural Marketing Costs and Charges, Pp. iii, 35, $.30; Volume 5, Consumption and Utiliza- 
tion of Agricultural Products, Pp. iv, 91, $.50; Volume 6, Land Values and Farm Finance, 
Pp. v, 56, $.40; Volume 7, Farm Population and Employment, Pp. v, 25, $.25; Volume 8, 
Crop and Livestock Estimates, Pp. iii, 24, $.20; Volume 9, Farmer Cooperatives, Pp. 
iii, 7, $.15. 


Ivan M. Les, University of California (Berkeley) 


ewe collection of papers, issued in nine separate publications during the latter half 
of 1957, makes accessible in a single handbook a convenient reference to the major 
quantitative measures of aggregate physical and economic magnitudes regularly con- 
structed and published by the Agricultural Marketing Service, the Agricultural 
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Research Service, the Farmer Cooperative Service, and the Foreign Agricultural 
Service of the Department of Agriculture. A listing of chapter headings by volume 
suggests the topical coverage and organization of treatment: Vol. 1—1, The index 
of prices received by farmers. 2, The index of prices paid by farmers for commodities 
and services, including interest, taxes, and wage rates. 3, The parity ratio and parity 
prices. Vol. 2—1, Land use. 2, Production inputs. 3, Production measures. 4, Produc- 
tion per unit. 5, Livestock-feed balance. Vol. 3—1, Farm income and expenditures. 
2, Costs and returns on commercial family-operated farms by type ,size, and location. 
Vol. 4—1, Farm-retail price spreads. 2, Elements of marketing cost. Vol. 5—1, 
Measuring the supply and utilization of farm commodities. 2, International trade in 
agricultural commodities. 3, Domestic use of farm food commodities. 4, Nutritive 
value of the United States food supply. 5, Domestic use of farm nonfood commodities. 
Vol. 6—1, Farm real estate. 2, Farm debt. 3, Farm taxes and insurance. 4, Balance 
sheet of agriculture. Vol. 7—1, Farm population estimates. 2, Farm employment and 
farm wage rate series. 3, Farm-operator family level-of-living indexes for the United 
States. Vols. 8 and 9 are a single chapter each. Vol. 8 gives a brief and not too informa- 
tive account of the data collection program of the Agricultural Estimates Division 
and a listing of subjects covered in current statistical reports.' Vol. 9 consists of only 
seven pages and deals specifically with data assembled for depicting trends in devel- 
opment of agricultural cooperatives in the United States. In view of the diversity of 
topics covered, it does not seem practicable to attempt to give attention to specific 
measures in the space allotted for this review. The comments which follow refer gen- 
erally to the first seven volumes, in which the major emphasis is on measure con- 
struction as opposed to data collection or estimation. 

The separate volumes, and indeed separate chapters within volumes, are in large 
measure independent. Certain important basic data are common to several of the 
measures described and this is acknowledged usually by cross references to entire 
chapters in the same volume or in other volumes. The classification of measures by 
volumes and chapters conforms to the classification reflected historically in the De- 
partment’s reports and is presumably dictated more by organization of work within 
the Department than by over-all conceptual criteria. To be sure, conceptual con- 
siderations are reflected in the classification adopted, but this phase of the measure- 
ment problem needs systematic attention and should have been given more promi- 
nence. This observation is pertinent to certain individual measures discussed, and 
also to sets of interrelated measures. In the measurement of marketing margins, for 
example, making more explicit the economic relations generating the magnitude or 
magnitudes to be measured should contribute materially to precision in definition of 
what is being measured and to interpretation and appraisal of the empirical measure 
constructed. With respect to sets of interrelated magnitudes, the definition of meas- 
ures and systematic treatment of problems associated with measurement of aggre- 
gate production and inputs, prices, and income might be somewhat more effectively 
dealt with in a more comprehensive conceptually integrated framework. Support can 
be developed for the position that such a framework would have important advan- 
tages for discussing rather more precisely the relations between measures sought and 
for bringing the limitations of the empirical constructs actually developed into sharp- 
er focus. 

Returning to the material as organized, much of what appears in the collection of 
discussions has appeared in similar form in various publications of the Department of 


1 This volume is clearly not intended to do more than give a rather broad view of the estimating program. The 
reader is referred at the outset to The Agricultural Estimating and Reporting Services of the United States Department 
of Agriculture, United States Department of Agriculture Miscellaneous Publication 703, for a more comprehensive 
development of procedures. 
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Agriculture over the years. Many pages in total are given over to presentation of 
summary series. As is implied by the designation “Agriculture Handbook,” heavy 
textual emphasis on description of procedures employed in the construction of meas- 
ures is common to all chapters. Shorter sections on sources of data, uses and limita- 
tions of measures, and comparison with similar measures constructed by USDA or 
other agencies of the United States government normally follow the description of 
procedures in each chapter. Appraisal of measures is dealt with generally in rather 
routine fashion, and in the absence of explicit conceptualization such comments as 
are offered bearing on appraisal of concepts rest heavily on intuitive plausibility. 
Chap. 1 of Vol. 3, dealing with farm income and expenditure measures, approaches 
the appraisal problem somewhat more systematically than do the other papers. 

Although description of procedures occupies a central place in this Handbook, 
omission of detail is unavoidable in compressing these descriptions into the relatively 
few pages allotted. However, publications dealing in more detail with specific meas- 
ures are prominently cited, so the “careful student” referred to in the Foreword will 
find the Handbook a convenient place to start in his attempt to learn what a par- 
ticular series represents. 


Canada’s Economic Development 1867-1953, Income and Wealth Series VII. O. J. Fire- 
stone. London: Bowles and Bowles, 1958. Pp. xxvi, 384. 45s. 


Woman, State College of Washington 


IRESTONE has produced a useful addition to the well-known series published for 

the International Association for Research in Income and Wealth. In contrast 
to the preceding volumes of the series which were collections of papers by a number 
of authors, the volume under review represents the work of a single individual. 

The author has set himself three tasks: (1) to prepare new estimates of national 
income and of certain demographic factors for Canada for the period from 1867 to 
1920 (in some cases to 1925); (2) to analyze Canada’s economic growth within the 
framework of national expenditure (product) accounts; and (3) to present an histor- 
ical review of both the private and official national income and wealth estimates for 
Canada which had been attempted prior to Firestone’s own work. 

Firestone’s new estimates include: (1) annual estimates of population and family 
formation for the period 1867 to 1925; (2) estimates of gross national expenditure by 
major components—consumer expenditures (8 items), investment (3 items), gov- 
ernment expenditure (2 items), exports and imports of goods and services (2 items) 
—for 1870, 1880, 1890, 1900, 1910, and 1920; (3) estimates of gross national expendi- 
ture in current and constant (1935-39) dollars for 1867 and annually for the period 
1870-1925; (4) estimates of annual averages for five-year periods of gross national 
expenditure in constant (1935-39) dollars, total and per capita; (5) estimates of gross 
national product for 1870, 1880, 1890, 1900, 1910, and 1920 on an industry basis using 
a value added technique and covering ten sectors. For the years for which estimates 
of both gross national product and gross national expenditures (in current dollars) 
both exist (1870, 1880, 1890, 1900, 1910, and 1920), the estimates appear to be gen- 
uinely independent and the discrepancy between the two never exceeds a tolerable 
limit. 

These estimates represent a significant advance over the data which had been 
previously available for the study of the long-term economic growth of the Canadian 
economy. This would be the case if only because the scope of Firestone’s work is 
greater than that of any which has previously been published—no continuous series 
either for the aggregate or for the major breakdowns were available for as long a pe- 
riod. However, the novelty of Firestone’s work is not its only virtue. The value of his 
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data is enhanced by the fact that he has succeeded to a considerable degree in making 
his categories—and to this extent his series—comparable with the official estimates 
published by the statistical agencies of the Dominion Government for the years sub- 
sequent to 1920. Great care appears to have been exercised to assure that virtually all 
of the extant sources of primary data should be sifted, and the work is marked by a 
generally high level of competence in the fitting of the basic data into a system of 
social accounts. 

The statistical procedures inherent in the construction of Mr. Firestone’s current 
dollar estimates do not appear to be open to serious question. The same, unfortun- 
ately, cannot be said of his constant dollar estimates. To obtain a consistent base 
year volume index requires given year price indices as deflators. In other words, to 
obtain an index of physical volume at constant 1935-39 prices, Firestone ought to 
have deflated each year’s current values by relevant price indices based on weights 
pertaining to that year. The marked extent to which the construction of Firestone’s 
constant dollar estimates fall short of this ideal for the years prior to 1926 is implied 
at many points, especially on pages 277-80. It is probably the case that inadequate 
basic data forced Firestone into the course which he took. However, it is difficult to 
judge the extent to which this was the case since Firestone’s book at no point dis- 
cusses the methodological issues involved in the construction of a constant dollar 
series, and therefore gives no reasons as te why he produced results which were less 
than ideal. Such a discussion would have proved most useful, both in and of itself, 
and because it would have contributed to promoting a much-needed methodological 
awareness of the issues involved in the construction of consistent constant price 
series among workers in the field of “historical” national income estimation. 

As weak as Firestone’s constant dollar figures appear to be, the reviewer would 
guess that they are superior to the constant dollar gross national preduct estimates 
extant for most countries in the years before commodity flow accounts began to be 
kept, since they embody more than the “usual” degree of disaggregation. However, 
many rights do not right a wrong, and it must be recognized that Firestone’s constant 
dollar results, both for individual subperiods and for the period as a whole, are po- 
tentially subject to considerable error. 

Firestone’s analytical section, comprising well over one half of the book, is divided 
into ten sections: (1) the changing economic and institutional setting; (2) popula- 
tion, families, and the labor force; (3) the nation’s output and markets; (4) consumer 
expenditures; (5) gross investment and capital consumption; (6) government ex- 
penditures; (7) foreign trade and the balance of payments; (8) incotie and prices; 
(9) changes in the industrial structure; and (10) changes in productivity and capital 
requirements. 

Though replete with interesting information, and offering an illuminating analysis 
of particular phenomena—the discussion of the immigration-emigration issue is par- 
ticularly good—the analytical section suffers from defects as an account of the gen- 
eral pattern of economic growth in Canada. 

Firstly, the discussion is overly compartmentalized. One misses a discussion of the 
interactions of the various factors under consideration. The implicit model underlying 
the analytical section appears to be the “industrialization” model which derives 
ultimately from Petty’s law. This is, of course, all to the good since the working of 
Petty’s law has proved to be a most useful way of organizing discussions of the eco- 
nomic growth of the advanced countries. Yet, this model is nowhere spelled out, nor 
does a general account of Canadian development relating the parts to the whole 
appear anywhere in the book. Secondly, the discussion depends too much on endo- 
genous factors. Canada, because of its limited size, never succeeded—despite Sir 
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John A. MacDonald’s “National Policy” which has in most respects formed the 
basis of Canadian economic policy since confederation—in insulating itself from the 
rest of the North Atlantic countries, especially the United Kingdom and the United 
States. Yet important happenings outside the country are not given sufficient expli- 
cit weight in accounting for the pattern of Canadian development. 

The final section presents an historical review of national income and national 
wealti: estimates for Canada. This excellent piece of analytical bibliography will put 
many students in Firestone’s debt. 

The book under review represents an important addition to the growing body of 
quantitative literature dealing with economic growth in the advanced countries. It 
is primarily in work of this type that our hopes of developing a genuinely useful 
theory of economic development lie. 


Selected Studies of Migration Since World War II, Proceedings of the thirty-fourth annual 
conference of the Milbank Memorial Fund, Held October 30-31, at the New York Acad- 
emy of Medicine, Part IIT. New York: Milbank Memorial Fund, 1958. Pp. 244. $1.00 
Paper. 


James J. Mastowsk1, Bureau of the Census 


~~ Milbank Memorial Fund has performed another valuable service in present- 
ing the papers and some of the discussions from the proceedings of its thirty-fourth 
Annual Conference. This volume consists of two parts. Part I contains ten selected 
studies of migration since World War II and is divided into three groups: Inter- 
national Context, Domestic Setting, and Topical. Part II contains the program of the 
Annual Dinner held on the evening of October 30. Since limitations of space pre- 
clude detailed attention to all the parts of this highly diverse collection of papers and 
discussions, this review will attempt to indicate briefly what is included, and will 
comment on certain papers at greater length than others. 

The first section, “International Context,” begins with a highly satisfactory review 
by Dudley Kirk of the major migratory movements which have occurred since 
World War II in the world at large and in the United States. This review is success- 
fully extended by the specific contributions of the discussants on both forced and 
normal European migrations. Irene B. Taeuber presents a comprehensive analysis of 
the volume and direction of internal migration in Japan since 1945. A significant 
contribution of this study is the testing of the hypothesis that a given area should 
lose migrants to areas more industrial than itself and gain migrants from areas more 
agricultural than itself. Rupert B. Vance concludes the section by admirably propos- 
ing certain prerequisites which are necessary if immigration is to be economically and 
socially sound. 

The second section devoted to “Domestic Setting” has quite a different character 
from the first in that the four papers are exclusively devoted to the United States. 
Ernest Rubin’s report reviews current immigration to the United States and criti- 
cizes existing laws and policies. His recommendations raises substantial questions 
concerning the role of the demographer in policy making. C. Horace Hamilton pre- 
sents a preliminary report on the educational selectivity of rural-urban migration in 
North Carolina. This study is of great import in that two methods are used to test 
educational selectivity. The first method is the extension of the residual survival rate 
method of estimating and analyzing net migration to a population that changes over 
time. The second method is the application of sample survey techniques. Both meth- 
ods indicate that rural-urban migration is highly selective with regard to education. 
Donald J. Bogue presents the economic and social implications of population changes 
in the Chicago Metropolitan Area. Of considerable importance is the stress on the 
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need for studying migration on a longitudinal basis in which the experience of real 
cohorts can be traced. No attempt, however, is made to present the intricate prob- 
lems involved in this type of study. The concluding paper, by Everett S. Lee, pre- 
sents some of the findings from a recent study of migration in relation to mental dis- 
ease in New York State during 1949-1951. It is surprising that no attempt was made 
to establish control populations or exposure rates. As Muhsam points out in the 
discussion, the conclusion with respect to selectivity for the propensity to mental dis- 
ease depends on the population of the origin of the migrants, the migrants themselves, 
and the receiving population. 

In the final section “Topical,” John Folger presents a provocative paper on meth- 
ods and models for research in migration. It is not too surprising that in the earlier 
periods of development, the field of migration relied almost exclusively on a few 
elementary, easy-to-understand methods. These are, however, formative years for 
more advanced and organized methods. Models similar to that proposed by Folger 
are vital and can be tested through the application of the more advanced statistical 
methods in conjunction with electronic computers. Joseph Spengler, discussing the 
economic theory of migration, emphasizes the use of models to give direction to re- 
search. It appears obvious that models are needed where the effects of migration are 
extremely complex and have wide-spread ramifications. The final paper, by Simon 
Kuznets and Dorothy Thomas, discusses the relationship between economic develop- 
ment and migration. This study is an excellent example of interdisciplinary research 
in which the multiple economic factors which relate to population movement are 
studied through space as well as time. 

The volume is of great importance to demographers and should interest anyone 
who is actually or potentially interested in migration. Its contents, however, will not 
generally interest professional statisticians, except as they suggest problems which 
might be attacked by different methods. It is obvious, however, that the approach to 
migration analysis in terms of social, economic, and psychological factors presents 
numerous intricate problems in sample survey designs, in the construction of meas- 
uring devices, and in the techniques of measuring interrelations. These types of prob- 
lems should be of interest to professional statisticians, especially those whose field 
of application is demography. 


Migration and Economic Development in Rhode Island. Kurt B. Mayer and Sidney Gold- 
stein. Providence: Brown University Press, 1958. Pp. 64. $1.75. 


BeNnJAMIN Cuinitz, New York Metropolitan Region Study 


HIs short essay pulls together some interesting figures in a handful of tables on 
migration in Rhode Island. It tells the sad tale of many an area in the Northeast 
which has failed to grow with the nation in recent decades. 

In brief, the story is this: between 1870 and 1910 Rhode Island not only attracted 
thousands of immigrants from abroad—60,000 in the decade 1900-1910 alone—it was 
also luring Americans from other states. By 1910 there were 21,000 more out-of- 
staters residing in Rhode Island than Rhode Islanders living in other states. But all 
that has changed. The net migration rate declined precipitously between 1910 and 
1920 from 153 to 28 per thousand and ever since then things have gone from bad to 
worse. The figure for the period 1950-1956 is a stark minus 10.1 per thousand! 

What happened? The most telling blow was the almost complete cessation of im- 
migration from abroad as a result of national policy. “More important, however, 
were economic conditions,” the authors go on to say. There follows the familiar 
account of the decline of the textile industry in New England, under the pressure of 
competition from the South. Once Rhode Island began to decline as a manufacturing 
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center, the authors assert, it could no longer fare well in the competition for American 
migrants. The decline of manufacturing employment and net migration are also found 
to be correlated with another index of economic health, per capita income. In 1880 
Rhode Island’s per capita income was 60 per cent above the nation’s; today, it is bare- 
ly equal to it. 

The essay goes on to examine the composition of migration in the seven decades 
between 1870 and 1950. Two techniques which are discussed in detail in a methodo- 
logical appendix provide the basis for estimates of the sex, age, and race of Rhode 
Island’s migrants, their origin and destination. The latter characteristics are gleaned 
from State of Birth data and the former with the aid of the Survival Rate Method. To 
one unfamiliar with demographic data it comes as a surprise to learn that there is 
such a wealth of information on internal migration. 

But there is no systematic statistical analysis of these data in an effort to test 
some notions about the causes and consequences of migration. A few grains of evi- 
dence are scraped off the top, so to speak, which tend to confirm some of the authors’ 
notions about the process. The origin-destination tables show that Rhode Island con- 
tinued to attract migrants from its low per capita income neighbors while it was losing 
them to high per capita income neighbors. The age data, in turn, show that the 
sharpest decline in migration rates occurred in the critical age range from 15 to 44, 
supporting the hypothesis about the failure of job opportunities. 

In the final analysis the value of the essay rests on its descriptive materials. Let the 
reader bring to them some preconceptions about the way old areas decline and he 
will find it stimulating to try them out on Rhode Island as its story is told here. 


Trends in Birth Rates in the United States since 1870. Bernard Okun. The Johns Hopkins 
University Studies in Historical and Political Science, Series LX XVI, Number 1. Balti- 
more, Md.: The Johns Hopkins Press, 1958. Pp. 203. $3.50. Paper. 


Lincoun H. Day, Princeton University 


— study, originally a doctoral dissertatiou in economics at Johns Hopkins, is 
based on data compiled from census figures by the University of Pennsylvania 
Study of Population Redistribution and Economic Growth. It brings together new 
empirical evidence of the secular decline in fertility in the United States and contains 
useful breakdowns by race, state, region, residence (rural, urban, rural-farm, and 
rural-nonfarm), and nativity for whites. 

Despite its title, the measures of fertility used are birth ratios rather than birth 
rates: children under 5 divided by (a) total population, (b) women 15-44, and 
(ec) women 20-29. Dividing the 1870-1950 time interval into two periods, 1870-1910 
and 1910-1950, the general trend is calculated by means of a geometric average of the 
ratios for each period. The degree of internal consistency in these trends is ascertained 
from a 3-item moving geometric average. 

Okun notes three shortcomings of the birth ratio as a measure of the birth rate: 
(1) the differential declines in the death rate, the decline among children 0—4 being 
greater than that for woman 15—44; (2) interstate migration; and (3) underenumera- 
tion. Though he sometimes uses the ratio of children aged 5-9 to women aged 20-49 
as a measure less affected by underenumeration, the data are not adjusted for any of 
these three sources of error; for as he points out, because of the direction they take, 
“these errors do not distort the conclusions of the paper... ([;] if adjust- 
ments [were] made to account for the errors, the conclusions [would be] strength- 
ened.” 

The study is divided into three parts: the first dealing with secular trends among 
whites; the second with secular trends among Negroes; and the third devoted to a dis- 
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cussion of hypotheses and approaches used to explain declining trends in birth rates, 
The analysis of Negro fertility is based on data from the 14 states—10 in the South, 
plus New York, Pennsylvania, Illinois, and Michigan—which contained 78 to 81 
per cent of all Negroes between 1870 and 1940. 

Okun’s main conclusion about changes in birth ratios among whites is that they 
are due to changes in neither age nor sex composition (except in the West), nor in the 
proportion foreign-born, nor in residential distribution (e.g., from rural to urban). 
They are, instead, “ascribable principally to changes in the reproductive patterns of 
persons in fixed environmental subdivisions” (rural, urban, rural-farm, etc.). 

The findings for Negroes are essentially the same so far as factors of age and sex 
composition are concerned. But there is a major difference between Negroes and 
whites in the “important contribution” which changes in the degree of urbanization 
among Negroes have made to changes in their birth ratios. Okun lists several possible 
explanations for this, but attempts no assessment of their relative merits. 

The third section, “A Review of Hypotheses and Approaches Used in Explaining 
Declining Trends in Birth Rates,” has little relation to what has gone before. It does, 
however, contain a good summary and critique of various “approaches”: the rational 
maximization model approach, time series analysis, cross-section analysis (noting that 
information obtained from such studies is not necessarily applicable to time series 
analysis), the institutional approach, interview and questionnaire techniques, and 
longitudinal analysis. 

This is a thorough, well-organized study. It does not provide an explanation of the 
changes in fertility; but it does narrow “the range of conjecture” and should for that 
reason prove valuable to all students of American fertility trends. 


Proceedings of the Fourth Annual Computer Applications Symposium. Armour Research 
Foundation, 1957. Pp. vii, 126. $3.00. 


O. W. Recuarp, State College of Washington 


HIS small volume contains the papers presented at a two day symposium in Chi- 
cago, October 24 and 25, 1957. 

The first day of the symposium was devoted to papers describing business and 
management applications of computers. These were: (1) “An Extensive Hospital 
and Surgical Insurance Record-Keeping System,” by R. J. Koch, a description of 
Blue Cross-Blue Shield record keeping in Michigan using a Datamatic 1000, and a 
discussion of characteristics of the Datamatic that led to its adoption for this pur- 
pose. (2) “A Central Computer Installation as a Part of an Air Line Reservation Sys- 
tem,” by R. A. Me Avoy, a description of a typical air line reservation system and a 
discussion of Eastern Air Lines’ plans to utilize a Univac File Computer together with 
a special input-output device called the “agent set” to perform the functions of a 
control agent in its own system. (3) “Fitting a Computer into an Inventory Control 
Problem”, by O. A. Kral, an account of how the Applied Mathematics and Statistics 
Department at Minnesota Mining and Manufacturing used this firm’s I.B.M. 705 
computer to study an inventory problem involving seventeen branch warehouses 
and one plant warehouse. (4) “The Problems of Planning New Metropolitan Trans- 
portation Facilities and Some Computer Applications”, by J. D. Carroll Jr., an out- 
line of some of the problems facing metropolitan planners and an indication of how 
Chicago has utilized a Datatron digital computer together with two special purpose 
analog machines in its traffic studies. (5) “Data-Processing Tasks for the 1960 
Census”, by D. H. Heiser and Dorothy P. Armstrong, a discussion of the data proc- 
essing problems involved in the 1960 census and the plans of the Bureau of the Census 
to utilize a Univac 1105 Computer with a special input device to solve them. (6) “The 
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Handling of Retail Requisitions From a General Warehouse”, by M. J. Stoughton, a 
description of how the Sears, Roebuck Co. plans to use a Datatron in its Chicago 
Mail Order house to facilitate the handling of requisitions from its reatil stores. 
(7) “Automatic Programming for Business Applications”, by Grace M. Hopper, a 
discussion of the history of automatic programming and of the philosophy behind 
the development of Flow-Matic, Remington-Rand’s programming system for busi- 
ness applications. 

The following papers dealing with engineering and research applications were 
presented on the second day of the symposium: (1) “Digital Simulation of Active 
Air Defense Systems”, by R. P. Rich, a description of a “Monte-Carlo” type study of 
the effectiveness of various doctrines of weapon assignment, carried out on a Rem- 
ington-Rand 1103 A computer. (2) “Statistical Calculations in Product Development 
Research”, by E. B. Gasser, a discussion of the use of a Librascope LGP-30 in the 
statistical calculations (mostly concerned with applications of the analysis of variance 
technique) arising in the research laboratories of the Toni Company. (3) “Progress in 
Computer Application to Electrical Machine and System Design”, by E. L. Harder, 
a rather discursive paper touching on developments in machines and programming 
as well as a wide range of applications in the field of engineering. (4) “How Lazy Can 
You Get?”, by A. L. Samuel, a speculative paper suggesting future developments in 
machine programming with particular emphasis on recent attempts to use a large 
scale computer to simulate intelligent behavior. (5) “The Solution of Certain Prob- 
lems Occurring in the Study of Fluid Flow”, by L. U. Albers, a discussion of the 
numerical solution (on an I.B.M. 650 computer) of some of the differential equations 
arising in fluid flow problems at the NACA Lewis Laboratory. (6) “A Dual-Use 
Digital Computer for Dynamic System Analysis”, by E. H. Clamons and R. D. 
Adams, a description of the Bendix Digital Differential Analyser as an attachment 
to the G-15 D general-purpose computer. (7) “The Status of Automatic Program- 
ming for Scientific Problems”, by R. W. Bemer, a catalogue of automatic coding sys- 
tems that are either operational or in the process of development together with brief 
descriptions of some of the more important ones. 

As in any symposium, there is a wide variation in the quality and interest of the 
papers. Generally, it seemed to me that the business and management applications 
reported here were more unusual and interesting than the engineering and research 
applications. This is due, in part, to the fact that the application of high speed digital 
computers to business and management problems is a much newer and, probably, in- 
trinsically broader field. High speed stored program computers were originally de- 
signed for scientific uses and it is only within the last few years that their tremendous 
potential in the general field of information processing has been recognized. 

Of particular interest to readers of this Journal would be the papers by Kral, 
Heiser and Armstrong, Rich, and Gasser. It is perhaps worth pointing out that this 
list includes two papers from each of the major divisions of the symposium. This is, 
of ee*irse, no accident. The statistician’s interest in computers runs the entire gamut 
of tneir utility, from the processing of vast quantities of data—a typical “business” 
application—to the inversion of large scale matrices. He must concern himself with 
all of the complexities of input and output that bedevil the commercial users of com- 
puting equipment and at the same time be familiar with many of the problems and 
procedures at the frontiers of research in numerical analysis. 

In addition to the papers listed above, I would recommend the two papers on 
automatic programming and the luncheon address by A. L. Samuel as introductions 
to some of the more interesting and creative programming efforts now being made to 
broaden the scope of machines to include even more areas of human activity. 
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Principles and Applications of Random Noise Theory. Julius S. Bendat. New York: John 
Wiley and Sons, Inc., 1958. Pp. xxi, 431, $11.00. 


EMANUEL ParzEN Stanford University 


HIS is an exposition of those statistical ideas introduced into engineering during 

the second world war, whose wartime development was summarized in the classic 
works of 8. O. Rice (“Mathematical Analysis of Random Noise,” Bell System Tech- 
nical Journal, Vol. 23 [1944], pp. 282-332 and Vol. 24 [1945], pp. 46-156) and 
Norbert Wiener (The Extrapolation, Interpolation, and Smoothing of Stationary Time 
Series with Engineering Applications, John Wiley and Sons, Inc., New York, 1949). 
The main topics discussed in the book are (i) the notiea uf the correlation function 
and spectrum of a random process, and (ii) the notion of an optimum linear filter and 
predictor. These topics are discussed at great length and with a good deal of heat. 

The engineer on the job may find the book useful, since it contains a large amount 
of very detailed information concerning correlation functions and the problem of op- 
timum filter design. However, it is doubtful whether the student (or the statistician) 
can obtain from this book a clear picture of the role these notions play in the design 
of modern electronic communication and automatic control systems. The writing is 
verbose, and contains a large number of incorrect and ambiguous statements. This 
would not be so blameworthy were it not that the author claims to go beyond other 
writers in pointing up mathematical restrictions which need to be imposed on the 
conclusions drawn. 

A typical example of a statement in the book which is both unclear and incorrect 
is the following: After deriving (without stating necessary differentiability hypothe- 
ses), the result that for a wide-sense stationary random function z(t), with derivative 
z’(t), the correlation E[z(t)z’(t)]=0 for all t, the author goes on to interpret this 
result (given in equation [1-70] on p. 21) as follows: “In words, Eq. 1-70 states that 
the average slope over the ensemble records at any time ¢ has an equal probability of 
being positive or negative.” Presumably, the author means to speak about the slope 
rather than the average slope; even then the assertion is false (a counter-example may 
be constructed using the derivative of the shot effect model). The author employs no- 
tions and notation which are objectionable from the point of view of modern proba- 
bility theory, and which make the book somewhat unpleasant reading for mathe- 
matically trained persons. Particularly indefensible is the definition of a random proc- 
ess { *x(t) } as an ensemble of functions indexed by a parameter k; he makes no mention 
of the role of k as an element in a probability space. 

While the book contains a certain amount of information of great importance and 
usefulness, it is definitely not “a systematic development of fundamental topics,” as 
it is claimed to be. 
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AMERICAN STATISTICAL 4SSOCIATION 
REPORT OF THE BOARD OF DIRE TORS, 1958 


As noted in the President’s Column in various issi:3 of The American Statistician 
during the year, the Association has experienced an *iaequaled increase in growth and 
activity. While specific details and figures are given separately in the Report of the Secre- 
tary-Treasurer, it is apropos to note that this increase occurred throughout the period of 
“recession.” The mounting demand in research, industry, government, etc. for personnel 
trained in all scientific disciplines has been reflected in the greater interest in statistical and 
mathematical methods. The use of statistics as a tool exists in every field of human knowl- 
edge. The American Statistical Association will continue to lead in its role as a central 
source through which these new developments will be made known. 


Publications 


Continuing the study on publications and policy begun in 1957, the Board in 1958 
authorized the design of a questionnaire to be sent to a random sample of the membership. 
A total of 1,769 forms were sent, with the sample design split between members of the 
large Business and Economic Statistics Section and all others. The sample included foreign 
members in the same proportion as domestic. By year’s end, tabulation was almost com- 
plete. Much information will be available for discussion by the Board and Council, upon 
which to base decisions for future policy. The membership will be informed by further 
accounts in The American Statistician. 

Both the Journal and The American Statistician cost more in 1958. That is, their total 
expense exceeded the amounts originally budgeted. As usual, rising costs played a partial 
role here, although both periodicals carried more pages than the preceding year. 

The third conse... “ive Proceedings of the Business and Economic Statistics Section 
were published in 1958, from the sessions given at the 1957 Annual Meeting in Atlantic 
City. An edition of 1200 copies has completely sold out. These Proceedings are now self- 
supporting, being sold at a price designed only to recover the cost. The 1957 Proceedings, 
contained papers and discussions from twenty-two different sessions. Publication of the 
1958 edition has been approved and scheduled for completion in spring, 1959. 

Another Proceedings volume has also been authorized after careful study: the 1958 
Proceedings of the Social Statistics Section. This has been approved in response to re- 
quests by representatives of the Section. This book, like that of the Business and Eco- 
nomic Statistics Section, will be priced at cost. Sales to members will determine the suc- 
cess of the first edition and the appearance of future proceedings. 

The 1958 Membership Directory was mailed to all members in November. This Di- 
rectory contained approximately 6,600 listings or an increase of 22% over the 1954 edition. 
(This figure, as usual, is a gross number of members, net being about 6,400 at the end of 
1958.) The listings were expanded to include degrees, granting institutions and major 
fields of statistical interest. Although the new Directory contained many more listings and 
more information than previous versions, it is gratifying to note that costs were held down 
by the use of various economies, enabling the finished product to stay well under budget. 

Still another new publication for 1958 is the Abstracts Booklet. Heretofore, the sum- 
maries of papers presented at an Annual Meeting were printed in a subsequent issue of 
the Journal (usually June). However, in response to repeated requests from members for 
abstracts available in time for the meeting and with the cooperation and planning of the 
1958 Program Committee, the brief summaries of papers presented in Chicago were pub- 
lished in a separate volume and sold for a nominal charge at the registration desk. 

1958 has been notable for the establishment of a new journal in the physical and en- 
gineering sciences called TECHNOMETRICS, under the joint sponsorship of ASA and the 
American Society for Quality Control. The first issue was scheduled to appear in April, 
1959. Both societies have agreed to allocate the necessary funds for publication, and a 
Management Committee is studying various aspects prior to appearance of the first num- 
ber. Members of both societies will be able to subscribe to the Journal at a reduced rate. 
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Meetings 


The Annual Meeting in Chicago, December 1958, contained 52 sessions. In addition 
to the general sessions arranged by the Program Cummittee, each of the five Sections or- 
ganized and developed sessions of primary interest to their members. 

Several Regional and Sectional meetings were held also during 1958, sponsored by some 
chapters and sections. These were often co-sponsored by other societies. 

Future Annual Meetings are as follows: 


1959 Washington, D.C. Shoreham Hotel December 27-30 
1960 Palo Alto, California Stanford University Late August 
1961 New York City Roosevelt Hotel December 27-30 


The meeting sites for 1962 and 1963 are now under consideration, and will be early 
autumn meetings. 
Activities 

The thirty-eighth active chapter of the Association was chartered by the Board of Di- 
rectors when the Central Iowa Chapter was approved early in 1958. The Denver chapter 
changed its name to the Colorado-Wyoming Chapter. 

Three members attended the meeting of the International Statistical Institute in 
Brussels, under the grant from the Carnegie Corporation. This provides travel funds over 
a three-year period for attendance at international meetings of statistical or allied socie- 
ties. The three grantees in 1958 were B. G. Greenberg, Nathan Mantel and Donald C. 
Riley. 

In summary, the year has been one of growth, greater liaison with the Chapters and 
with other associations, and a growing recognition of the role of the American Statistical 


Association as a meeting point in its journals and meetings for those professionally en- 
gaged in many fields where statistics are used. 
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In contrast to expectations earlier in the year of a slight loss for 1958, the year closed 
with a small addition to the cumulative surplus. Even though both tie Journal and The 
American Statistician cost more than the previous year and were accompanied by an in- 
crease in most expense items, the sharp rise in some income items more than offset these 
enlarged expenses. Particularly gratifying was the number of new members joining and, 
conversely, the relatively small number of old members who were dropped for various 
reasons. Non-member subscriptions, sales of publications and advertising all showed sub- 
stantial increases over amounts budgeted. 

The amount added at the end of 1958 to the reserve is $3,800. Total income reached 
$84,368 and expenses took $80,562. The 1958 budget contained an amount of $2,620 to be 
withdrawn from reserves, contingent upon publications expense. This amount was not 
used; the fund added to surplus are the net between actual income and expense. The 1958 
audit statement gives full details on each item ‘n the budget. 

It is pleasant to report that one expense came to considerably less than originally esti- 
mated: the 1958 Membership Directory. This was budgeted at $9,000, to be divided 
equally over the three-year period, 1957, 1958 and 1959. The amount of $3,000 was allo- 
cated to expense for each of the first two years, but because printing economies were de- 
veloped, the final 1959 portion will be very substantially below the original $3,000 alloca- 
tion. Advertising and sales will also further reduce this amount. 

In 1959, income and expense are estimated to be approximately equal; allowing for an 
increase in Journal budget and estimating a proportionate rise in membership dues. With 
a continuing effort on promotion, we expect to gain as many new members in 1959 as in 
1958. Income is budgeted at $88,050; expense totals are estimated to reack $87,050. Since 
these two totals are so close, no addition to cumulative surplus seems likely. This is re- 
grettable and may reinforce my prediction that ASA will be forced to follow other associa- 
tions in increasing dues. A recent study of other similar associations shows ASA dues be- 
low the average. 
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Since the ASA share for starting the new journal TEcHNOMETRICs will come from re- 
serves, actually the surplus fund will be reduced in 1959, although gradual reimbursement 
of this sum is probably to be expected. 

The Association started 1958 with a total membership of 5,667. As mentioned above, 
this has been an outstanding year with respect to the addition of new members and re- 
newals. The number of persons joining the Association during 1958 exceeded one thousand; 
this is the largest yearly total to date. On the other hand, fewer persons dropped their 
membership than for many years past. The combination of many new members with 
fewer drop-outs has raised ASA’s membership to a new high. The actual figures are as 
follows: 1,037 new members and 18 reinstatements less 397 resignations, deaths and ter- 
minations for lack of payment of dues. Consequently, we start 1959 with a total of 6,325 
members. 

It has been a good year, but one where even careful management and considerable 
good fortune may not again offset the growing demands on an expanding and increasingly 


recognized association. 
Donatp C. RILEY 


Secretary-Treasurer 


ALEXANDER GRANT & COMPANY 
CertTIFIED PuBLic ACCOUNTANTS 
1025 Connecticut AVENUE 
Wasuineton 6, D. C. 


Board of Directors 
American Statistical Association 

We have examined the balance sheet of the AMERICAN STATISTICAL ASSOCIATION (a 
nonprofit organization) as of December 31, 1958, and the related statement of income and 
association equity for the year then ended. Our examination was made in accordance with 
generally accepted auditing standards and accordingly included such tests of the account- 
ing records and such other auditing procedures as we considered necessary in the cireum- 
stances. 

In our opinion, the accompanying balance sheet and statement of income and associa- 
tion equity present fairly the financial position of the American Statistical Association at 
December 31, 1958, and the results of its operations for the year then ended, in conformity 
with generally accepted accounting principles applied on a basis consistent with that of 
the preceding year. 

Comments in regard to the scope of our examination and details of certain items shown 
in the balance sheet and statement of income and association equity are presented in the 
following paragraphs. 


BaLaNceE SHEET COMMENTS 
Cash 


Balances comprising the total cash of $23,075 at December 31, 1958 are as follows: 
American Security and Trust Company 
Operating account 
Special account 
Petty cash 


Cash in banks and funds on deposit with savings and loan institutions were confirmed 
directly with the depositories. Cash in the amount of $13,550 was restricted for the unex- 
pended balances of the Ford Foundation and Travel Fund grants at December 31, 1958. 


Investments 


Income from funds on deposit with savings banks and savings and loan institutions 
and from investments in government bonds totaled $3,066, which represents an increase 


i 
769 
286 
20 
$23 ,075 
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of $472 over the preceding year. The average interest rate on invested funds for the year 
ended December 31, 1958, was approximately 3.7%. 


Accounts receivable 


The following items make up the accounts receivable total of $5,787 at December 31, 
1958: 
Unremitted amount due for 1958 annual meeting 
Accrued interest on investments 
Amounts due for advertising in Journal 
Due from Social Science Research Council for article printed in December 1958 


In accordance with past practice, the cost of 100 copies of each of the four issues of the 
Journal during 1958 was capitalized and added to the value of the inventory of past issues. 


Accounts payable—trade 


At December 31, 1958, accounts payable—trade of $11,188 included the following: 
George Banta Co., Inc. (printing and mailing costs related to December 1958 


The McCardle Printing Company, Inc. (printing and mailing costs related to 
December 1958 American Statistician) 


$11,188 


Deferred income 


Total deferred income at December 31, 1958, of $36,861 was $4,531 less than the corre- 
sponding amount at December 31, 1957. Of the total decrease, $1,212 was related to a 
reduction in advance payments for membership dues. This reduction primarily was caused 
by a later mailing of 1959 membership bills as compared with the prior year. 


Unexpended grants 


The Association is the recipient of two grants. The Ford Foundation Grant was initi- 
ated in 1956 for the purpose of indexing the Journal of the American Statistical Association 
for volumes 35 to 50. The second grant, which is called the Travel Fund Grant, was re- 
ceived in 1957 from the Carnegie Corporation to cover expenses incurred on behalf of 
sending delegates to international meetings. 

Analysis of these two grants for 1958 follows: 

Ford Travel 
Foundation 
Grant 


Unexpended balance—January 1, 1958 $13 ,037 
Disbursements made during the year 


Unexpended balance— December 31, 1958 $ 6,191 


The major portion of the Ford Foundation Grant disbursements represented payments 
made to the Virginia Polytechnic Institute for work performed on the index project. Of 
the $1,573 disbursements charged to the Travel Fund Grant, $1,469 was incurred incident 


$5,787 
Inventories 
$8 ,932 
1,573 
$7,359 


to sending three members of the Association to the special session of the International 
Statistical Institute held in Brussels in September 1958. 


INCOME AND AssocIATION Equiry COMMENTS 
Income 


Total income for the year 1958 was $10,969 higher than the preceding year. The major 
contributing factor to this increase was brought about by the larger number of paid mem- 
bers for 1958 as compared to 1957. This factor alone accounted for slightly more than 50 
per cent of the above mentioned increase. 


Expenses 


The increase of $14,941 in 1958 expenses compared with 1957 is largely the result of 
higher costs related to printing and mailing the various publications of the Association. 
The increase in publication costs of $8,719 was primarily for the Journal. 


GENERAL 


During the course of our audit, we reviewed various aspects of the Association’s ac- 
counting and related office clerical procedures. Comments related to our findings and rec- 
ommendations are reported upon in a supplementary letter. 

ALEXANDER GRANT & CoMPANY 
Washington, D. C. 
March 27, 1959 


AMERICAN STATISTICAL ASSOCIATION 
CoMPARATIVE BALANCE SHEET 


DeEcEMBER 31, 1958 aNp 1957 
December 31, Increase 


1958 1957? (decrease) 
Assets 
Current Assets 
Investments 
United States savings bonds, series G, due 1962, 
United Stetes Treasury 23% bonds, $10,000 face 
amour’ ue November 1961, at cost......... 9,567 9,567 
Federal Land Bank 4% bonds, due May 1962, at cost 5,000 5,000 —_ 
Federal National Mortgage Association bonds, 
matured April 1958, at cost................. _ 5,000 (5,000) 
Funds on deposit in savings account and savings and 
Inventories—at approximate cost 
Monograph of Kinsey Report.................. 2,689 2,747 (58) 
Fized Assets—At Cost 
Less accumulated depreciation................. 6,088 5,711 377 


$ 1,905 $ 2,234 ($329) 


$112,766 $120,128 ($7,362) 


| 


The accompanying report letter is an integral part of this balance sheet. 


December 31, Increase 
1958 1957 (decrease) 


Liabilities 
Current Liabilities 
Accounts payable—trade $11,188 $ 7,946 $ 3,242 
New York Chapter 1,050 (125) 
Washington, D. C. Chapter 474 6 
Biometrics Society 4 ( 1,328) 
Payroll and excise taxes 280 (13) 


Total Current Liabilities $ 12,996 $11,214 $ 1,782 
Deferred Income 


Membership dues 25,127 26,339 (1,212) 
Subscriptions 
Journal of the American Statistical Association.... 7,486 
American Statistician 306 
3,515 
3,449 3,746 


$ 36,861 $ 41,392 ($4,531) 
Unexpended Grants 


Ford Foundation Grant. . 6,191 13 ,037 (6,846) 
Travel Fund Grant 7,359 8,932 (1,573) 


$ 13,550 $ 21,969 ($8,419) 
Association Equity 49 ,359 45,553 3,806 


$112,766 $120,128 ($7,362) 


The accompanying report letter is an integral part of this balance sheet. 


AMERICAN STATISTICAL ASSOCIATION 
CoMPARATIVE BALANCE SHEET 


DEcEMBER 31, 1958 AND 1957 
Year ended 
December 31, Increase 
1958 1957 (decrease) 


Income 


Dues—old members $41,523 

—new members 3,874 

Subscriptions—Journal 13 ,337 

—American Statistician 541 

Advertising—Journal 2,904 

—American Statistician 918 

Sales—Journal 2,646 

—American Statistician 198 
—1958 Membership Directory 

—Business and Economic Section Proceedings... . 3,019 

427 

1,418 

2,594 


$84,368 $73,399 $10,969 


$ 2,835 
3,110 
1,344 
36 
572 
104 
43 
(49) 
405 
1,412 
(91) 
776 
472 
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19,550 17,578 
Pension plan 
Promotion activity 


Travel and secretary’s expense 
Supplies and office expense 
Postage and delivery charges 
Telephone and telegraph 
Accounting services 
Committee expense 
Depreciation 


Repairs and maintenance 
Insurance 

Annual meeting—net 
Publications 


$65 ,621 


$7,778 ($3,972) 
37,775 7,778 


$45,553 $ 3,806 


The accompanying report letter is an integral part of this statement. 
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Expenses 
1,972 
480 
308 
151 169 (18) 
142 (281) 423 
8,719 
Association Equity—Beginning of Year................. 45,553 
Association Equity—End of Year...................... $49,359 [i 
| | | | 


Computer Programmers: 


IBM offers attractive career opportunities to versatile, imaginative 
programmers who want to break new ground in the fast-growing 
electronic computer field. You’ll have unusual professional freedom 
... work with specialists of diverse backgrounds . . . have access to 
a wealth of systems know-how. Whether you like to work inde- 
pendently or as a member of a small team, your contributions 
and achievements will be quickly recognized. 


ASSIGNMENTS NOW OPEN INCLUDE..; 


Math D-. 


gr : to specify and program elements of a sophisticated 
automatic programming system. 


Operational Programmer: to develop computer program techniques for real-time 
military applications using game theory and systems simulation. 


Senior Programmer: to analyze engineering problems and develop machine 
programs for their solutions; to develop digital programs for simulating 
bombing and navigational problems. 


Programmer: to write differential equations of circuit diagrams; to develop 
mathematical models of nuclear reactors; to investigate real-time control 
systems using high-speed digital and/or analog computers. 

Diagnostic Programmer: to prepare diagnostic programs for real-time computers 
which will check for computer malfunction, diagnosing source of error for 
correction. 


Systems Programmer: to generate efficient and unique logical programs for real- 
time control computers; to develop automatic FORTRAN-like coding systems 
for systems programs. 


Mathematician: to handle mathematical analysis and 704 programming to solve 
systems problems, differential equations, probability-‘ype problems, phoi.- 
geometry problems. 


704 Programmer: to analyze, program and code problems such as system simula- 
tion; to solve ordinary differential equations and numerical approximation 
of integrals. 


Qualifications: B.S., M.S., Ph.D. in Mathematics or the Physical Sciences. 
For details, write, outlining your background and interests, to: 

Mr. R. E. Rodgers, Dept. 577 

IBM Corporation 


590 Madison Avenue 
New York 22, New York 


IBM. 


INTERNATIONAL BUSINESS MACHINES CORPORATION 
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“THE 
AMHERST | 
LABORATORY 


Sylvania’s Center for 
Research and Development 


...8tresges basic and applied research on mathemat- 
ical topics related to communications. Broad areas 
of mathematics being explored in current investiga- 
tions include: 


FINITE GROUP THEORY 
NUMBER THEORY 
STATISTICAL DECISION THEORY 
FOURIER ANALYSIS OF CODED TRANSMISSIONS 
STATISTICAL THEORY OF COMMUNICATIONS & NOISE 
THEORETICAL & APPLIED PROBABILITY THEORY 


Changing interests 
ot members of the Mathematics Section are fre- 
quently reflected in the topics under study. 

Immediate openings 
exist for mathematicians at-several levels of train- 
ing and experience. 

Location 

of the Amherst Laboratory is in residential Wil- 
liamsville,. northeast of Buffalo. 


Address confidential inquiries to 
Dr. R. L. San Soucie 
Amherst Laboratory / SYLVANIA ELECTRONIC SYSTEMS 


A Division of 


SYLVANIA 


1122 Wehrle Drive, Amherst 21, N. Y. 
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MATHEMATICIANS 
—s 
7 


MATHEMATICIANS ~STATISTICIANS 


equate perfection with 


1. Exciting assignments at Cape Canaveral 
2. Opportunity for professional advancement 
3. Pleasant, relaxing Florida living 


IF and error systems analysis are challeng- 
ing to you. 

IF strani the state of the art in data processing techniques 
is intriguing to you. 

IF you’ve been looking for the “‘different’’—the ultimate in.a 
professional position . 

IF your field is: econo Analysis... Mathematical & Statis- 
tical Techniques ... Astronomy . . . Computer Applications or 
Computer Programming... 

THEN you belong with RCA in Florida! You can make the 
move right now... by contacting: 


Mr. D. A. Schindler 

Professional Placement Representative 
RCA Service Company, Dept. N-12F 
Atlantic Missile Range 

Mail Unit 114 

Patrick Air Force Base, Florida 


RCA SERVICE COMPANY 


A DIVISION OF RADIO CORPORATION OF AMERICA 
® 
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FLORIDA 


AN INVITATION 
TO JOIN ORO 


Pioneer In Operations Research 


Operations Research is a young science, earning 
recognition rapidly as a significant aid to "decision- 
making. It employs the services of mathematicians, 
physicists, economists, engineers, political scientists, 
psychologists, and others working on teams to syn- 
thesize all phases of a problem. 


At ORO, a civilian and non-governmental organ- 
ization, you will become one of a team assigned to 
Vital military problems in the area of tactics, strategy, 
logistics, weapons systems analysis and communi- 
cations. 


No other Operations Research organization has 
the broad experience of ORO. Founded in 1948 by 
Dr. Ellis A. Johnson, pioneer of U. S. Opsearch, 
ORO’s research findings have influenced decision- 
making on the highest military levels. 


ORO’s professional atmosphere encourages those 
with initiative and imagination to broaden their 
scientific capabilities. 

ORO starting salaries are competitive with those 
of industry and other private research organizations. 
Promotions are based solely on merit. The “‘fringe”’ 
benefits offered are ahead of those given by many 
companies. 


The cultural and historical features which attract 
visitors to Washington, D. C. are but a short drive 
from the pleasant Bethesda suburb in which ORO is 
located. Attractive homes and apartments are within 
walking distance and readily available in all price 
ranges. Schools are excellent. 


For further information write: 


OPERATIONS RESEARCH OFFICE 


The Johns Hopkins University 


6935 ARLINGTON ROAD 
BETHESDA 14, MARYLAND 


Please mention the Journal of the American STaTisTicaL ASSOCIATION in writing advertisers 


STATISTICIAN 


Detroit Research Laboratories has opening in Statistical 
Planning Group. Requires person with M.S. or Ph.D. degree 
in statistics and two to five years’ industrial experience in 
engineering, chemical or physical applications, Principal 
duties involve the planning and analysis of experimental 
work of a diverse nature encountered in our automotive and 
chemical research laboratories. The data are often charac- 
terized by high variability and high cost. Two digital com- 
puters are available to the group. For more particulars 
write to: 


Personnel Manager 
Ethyl Corporation 
Research Laboratories 
1600 W. Eight Mile Road 
Ferndale 20, Michigan 


Mathematical Statisticians 


sultants in areas of statistical inference, 


Exceptional opportunities exist at the 
Naval Proving Ground for mathematical 
statisticians with MS and PhD degrees 
and an interest in operations research. 
The principal efforts of the Operations 
Research Group at present are devoted to 
the formulation and execution of exten- 
sive programs in the areas of target analy- 
sis, weapons system analysis, and missile 
feasibility and evaluation. Senior Statis- 
ticians on the staff also serve as con- 


probability, and experimental design. 
The most advanced computing equip- 
ment and capable junior scientists are 
available for assistance. Starting salaries 
range from $7,510 to $11,595 per annum. 
The Naval Proving Ground provides an 
excellent work atmosphere and, in ad- 
dition, the advantages of living in a 
pleasant small community with economi- 
cal housing and many recreational fa- 
cilities. 


For further infomation, write to the Director, 
Computation and Analysis Laboratory. 


NPG 


U. S. Naval Proving Ground 


Department of the Navy @ Dahigren, Virginia 
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As a long term prime contractor for the 
AEC we design, develop and manufacture 
extremely precise, complex electronic and 
electro-mechanical devices, Our program 
enables us to offer these two excellent 
“ground floor” opportunities. They afford 
many professional and personal advan- 
tages well worth your immediate investi- 
gation. Kansas City is a beautiful and pro- 
gressive city, featuring convenience, mod- 
erate climate, modern, uncrowded and 
highly-rated schools, plus many recrea- 
tional and cultural activities. Assistance 
program for advanced study at local uni- 
versities if desired. 


STATISTICAL ANALYST 


. .. familiar with the underlying concept 
(and mathematics) of process quality con- 
trol, sampling procedures, and experi- 
mental design and analysis. Several years 
experience in industry together with a de- 
gree in statistics or a degree in mathe- 
matics or engineering with several courses 
in statistics, is required. This position 
requires close coordination with systems 


@ STATISTICAL ANALYST 
@ MATHEMATICAL STATISTICIAN 


engineers in setting up statistical quality 
control programs to insure reliability of 
the end product, 


MATHEMATICAL STATISTICIAN 


. .. with an advanced degree and familiar 
with nonparametric and order statistics, 
Monte Carlo procedures and Markoff proc- 
esses. Experience in implementing these 
methods is desirable, as is some familiar- 
ity with computers, particularly the IBM 
650. Your ability to recognize and formu- 
late problems is of paramount importance. 
Unique contribution in the fields of op- 
erations research, reliability evaluation 
and prediction, and statistical decision 
theory are expected. 


Mail brief, confidential resume to: 


Mr. T. H. Tillman 
Box 303-HU, Bendix, Kansas City, Mo. 


KANSAS CITY, MISSOURI 


OPERATIONS RESEARCH 


Immediate openings for scientists of ability, imagination and initiative in an 
— group orming operations research studies on a wide variety of 
in 


ustrial and military lems. 


Operations Research Analysis 


Requirements: Academic training (Master's level) and/or operations research 
experience. Ability to deal with people and assume project leadership. 


Statistical Analysis 
Requirements: Master’s degree in mathematical statistics. Knowledge of sam- 
pling and decision theories. Operations research experience desirable. 
Computing-Data Processing Analysis 


Requirements: Ability, experience and interest in analyzing decision problems 
for computer — and in data processing systems analysis. IBM 704 or 


705 experience helpful. 


For details, write to 
Mr. J. E. Farrell 
RESEARCH DEPARTMENT 


UNITED AIRCRAFT CORPORATION 


400 Main Street East Hartford 8, Conn. 
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MARKET RESEARCH 
TABULATING 


STATISTICAL service on market research 
tabulating begins long before a 
button is pushed. 


You get preliminary assistance in resolving 
our ideas . . . in translating sound thinking 
into well-planned questionnaires for the most 

practical and economical processing. 


There is always a best way to handle any 
assignment and STATISTICAL can help you apply 
it through long experience in methods 

and procedures. 

The same careful approach is used in processing 
data to assure highest quality in market information. 
Strict controls are maintained every —_ of the way 
from editing and coding to finished report. 

And this professional service is available to you 
days, nights, week-ends—any time you need it. 


Write for details today 


STATISTICAL | | otices: 


TABULATING CORPORATION 53 West Jackson Blvd. 
Established 1933 - Michael R. Notaro, President Chicago 4, Illinois 
TABULATING + CALCULATING + TYPING Phone: HArrison 7-4500 


TEMPORARY OFFICE PERSONNEL 


Chicago * New York * St.Louis * Newark ¢ Cleveland ¢ Los Angeles 
Kansas City °* Milwaukee ¢ San Francisco 
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PROCEDURE 
ECONOMICAL PROCESSING J 
at 


A complete 
tabulating service 
Billing 

Sales Analysis 

Payrolls 

Pension Planning 


Market Research 


JOHN FELIX 
ASSOCIATES 


N c R R A T €E 


3 EAST S4TH STREET* NEW YORK 22, NEW YORK 


PLaza 1-2050 
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New WMcGRAW-HILL Kooks 


A PRIMER OF PROGRAMMING 
FOR DIGITAL COMPUTERS 
By Marshal H. Wrubel, Indiana University. 230 pages, $7.50 


An introductory text, designed for junior-senior courses for physical scientists, engineers, 
and all other students who have problems to solve on computers. The purpose of the book 
is to explain how to go about setting up a problem for a digital computer, how to test i 
and how to make it available to others, The primer discusses procedures common to 
digital electronic machines, but an actual machine—the IBM Type 650—is used for 


examples. 


PROBABILITY AND STATISTICS FOR 

BUSINESS DECISIONS 

An Introduction to Managerial Economics Under Uncertainty 
By Robert Schlaifer, Harvard University. 732 pages, $11.50 


A nonmathematical introduction to the logical analysis of practical business problems in 
which a decision must be reached under uncertainty. The analysis which it recommends is 
based on the modern theory of utility and what has come to be known as the “personal” 


definition of probability, Exercises are provided at the end of each chapter. 


HIGH-SPEED DATA 
PROCESSING 


By C. C. GOTLIEB and J. N. P. HUME, both: 


of the University of Toronto. McGraw-Hill 
Series in Information Processing and Com- 
puters. 338 pages, $9.50 


A basic, comprehensive treatment of the im- 
portant principles and general techniques of 
processing data at high speeds. It shows how 
data processors work, how to use them, and 
what their advantages are. Coding and pro- 
gramming methods are included and ex- 
amples of typical applications of high-speed 
data processing are shown. 


APPLIED STATISTICS 
FOR ENGINEERS 


By WILLIAM VOLK, Hydrocarbon Research, 
Inc., Princeton, N.J. McGraw-Hill Series in 
Chemical Engineering. 354 pages, $9.50 


Provides enough background and examples 
of statistical problems to enable practicing 
engineers to apply statistical analysis to 
their data. Emphasizing applications and 
providing many illustrative examples, it 
deals with the treatment of engineering data 
for correlation, precision, and analysis of 
experimenta! factors. Each chapter is com- 
plete in itself, 


Send for copies on approval 


McGRAW-HILL BOOK COMPANY, INC. 


330 WEST 42no STREET, 


NEW YORK 36, NY. 
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A Leading Text in the Field .. . 


FUNDAMENTAL STATISTICS 
For Business and Economics 


Joun Neter, University of Minnesota; and 
Wittiam WasseRMAN, Syracuse University 


This highly successful text presents a modern approach to statistics, em- 
phasizing the a plication of statistical techniques (including the “prob- 
ability p Andes os as tools for decision ‘aites in business and eco- 
nomics. The text is non-mathematical and stresses the concepts underlying 
statistical methods and their application through the use of actual case 
examples. In addition, realistic problems and questions at the end of 
each chapter enable the student to apply statistical techniques to practi- 
cal situations. /nstructor’s Manual available. 


For examination copy, write to Arthur B. Conant: 


ALLYN AND BACON College Division 
150 Tremont Street, Boston 11, Mass. 


Just published! 


A masterly and creative synthesis 


A GENERAL THEORY OF THE PRICE 
LEVEL, OUTPUT, INCOME DISTRIBU- 
TION AND ECONOMIC GROWTH 


By Professor Sidney Weintraub, 
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